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GENOMIC SEQUENCES UPSTREAM OF THE CODING REGION OF THE 1FN-ALPHA2 *GENE FOR PROTFIN 
PRODUCTION AND DELIVERY K PKOrE,N 

Field of the Invention 
This invention relates to genomic DNA . 

5 Background of the Invention 

Current approaches to treating disease with 
therapeutic proteins include both administration of 
proteins produced in vitro and gene therapy. in vitro 
production of a protein generally involves the 

10 introduction of exogenous DNA coding for the protein of 
interest into appropriate host cells in culture. Gene 
therapy methods, on the other hand, involve administering 
to a patient cells, plasmids/ or viruses that contain a 
sequence encoding the therapeutic protein of interest. 

15 Certain therapeutic proteins may also be produced 

by altering the expression of their endogenous genes in a 
desired manner with gene targeting techniques. See, 
e.g., U.S. Patent Nos . 5,641,670, 5,733,761, and 
5,272,071, U.S. Patent Application Serial No. 08/406,030, 

20 WO 91/06666, WO 91/06667, and WO 90/11354, all of whick 
are incorporated by reference in their entirety. 



Summary of the Invention 
The present invention is based upon the 
identification and sequencing of genomic DNA 5' to the 

25 coding sequence of the human interf eron-o? 2 ( " IFNA2 11 ) 
gene. This DNA can be used, for example, in a DNA 
construct that alters (e.g., increases) expression of an 
endogenous IFNA2 gene in a mammalian cell upon 
integration into the genome of the cell via homologous 

3 0 recombination. "Endogenous IFNA2 gene" refers to a 

genomic (i.e., chromosomal) gene that encodes IFNA2 . The 
construct contains a targeting sequence including or 



WO 99/57292 PCT/US99/09925 



- 2 - 

derived from the newly disclosed 5' noncoding sequence, 
and a transcriptional regulatory sequence. The"' 
transcriptional regulatory sequence preferably differs in 
sequence from the transcriptional regulatory sequence of 
5 the endogenous IFNA2 gene. The targeting sequence 

directs the integration of the regulatory sequence into a 
region upstream of the endogenous IFNA2-coding sequence 
such that the regulatory sequence becomes operatively 
linked to the endogenous coding sequence. By 
10 "operatively linked" is meant that the regulatory 

sequence can direct expression of the endogenous IFNA2- 
coding sequence. The construct may additionally contain 
a selectable marker gene to facilitate selection of cells 
that have stably integrated the construct, and/or another 
15 coding sequence linked to a promoter. 

In one embodiment, the DNA construct comprises: 
(a) a targeting sequence, (b) a regulatory sequence, (c) 
an exon, (d) a splice-donor site, (e) an intron, and (f) 
a splice-acceptor site, wherein the targeting sequence 
20 directs the integration of itself and elements (b) - (f) 
such that elements (b) - (f) are within or upstream of 
the endogenous gene. The regulatory sequence then 
directs production of a transcript that includes not only 
elements (c) - (f ) , but also the endogenous IFNA2 coding 
25 sequence. Preferably, the intron and the splice-acceptor 
site are situated in the construct downstream from the 
splice-donor site. 

The targeting sequence is homologous to a pre- 
selected target site in the genome with which homologous 

30 recombination is to occur, it contains at least 20 
(e.g., at least 30, 50, 100, or. 1000) contiguous 
nucleotides of SEQ ID NO: 12; and can contain, for 
instance, at least 20 (e.g., at least 30, 50, or 100) 
contiguous nucleotides of SEQ ID N0:7, at least 20 (e.g., 

35 at least 30 or 50) contiguous nucleotides of SEQ ID NO: 8, 
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or at least 20 (e.g., at least 30, 50, loo, or 1000) 
contiguous nucleotides of SEQ ID NO: 13. m addition the 
targeting sequence can contain at least 20 (eg at ' 
least 30, 50, or 100) contiguous nucleotides of "sEQ id 
5 NO:16, at least 20 contiguous nucleotides of SEQ id 
NO:l7, at least 20 (e.g., at least 30 or 50) contiguous 
nucleotides of SEQ id NO:l8, or at least 20 (e g at 
least 30, 50, 100, or 1000) contiguous nucleotides of SEQ 
ID NO: 19. SEQ ID NO: 7 corresponds to nucleotides 1 to 
10 278 of SEQ ID NO: 12; SEQ ID NO:8 corresponds to 

nucleotides 3492 to 3564 of SEQ ID NO: 12; and SEQ ID 
NO: 13 corresponds to nucleotides 279 to 3491 of SEQ ID 
NO:12. By "homologous.- is meant that the targeting 
sequence is identical or sufficiently similar to its 
15 genomic target site so that the targeting sequence and 
target site can undergo homologous recombination within a 
human cell, a small percentage of basepair mismatches is 
acceptable, as long as homologous recombination can occur 
at a useful frequency. To facilitate homologous 
20 recombination, the targeting sequence is preferably at 
least about 20 (e.g., at least 50, 100, 250, 400, or , 
1,000) base pairs ("bp") long. The targeting sequence 
can also include genomic sequences from outside the 
regioncovered by SEQ ID NO: 12, so long as it includes as 
25 least 2.0 nucleotides from within this region. For 

example, additional targeting sequence could be derived 
from the sequence lying between SEQ ID NO: 8 and the 

endogenous transcription initiation sequence of the IFNA2 
gene . 

30 Due to polymorphism that exists at the IFNA2 

genetic locus, minor variations in the nucleotide 
composition of any given genomic target site may occur in 
any given mammalian species. Targeting sequences that 
correspond to such polymorphic variants of SEQ ID NOs • 7 

35 8, 12 , 13, is, 17, 18, and 19 (particularly human 



WO 99/57292 



PCT/US99/09925 



4 - 



10 



15 



polymorphic variants) are within the scope of this 
invention. 

Upon homologous recombination, the regulatory 
sequence of the construct is integrated into a pre- 
selected region upstream of the coding sequence of an 
IFNA2 gene in a chromosome of a cell. The resulting new 
transcription unit containing the construct -derived 
regulatory sequence alters the expression of the target 
IFNA2 gene. The IFNA2 protein so produced may be 
identical in sequence to the IFNA2 protein encoded by the 
unaltered, endogenous gene, or may contain additional, 
substituted, or fewer amino acid residues as compared to 
the wild type IFNA2 protein, due to changes introduced as 
a result of homologous recombination. 

Altering gene expression encompasses activating 
(or causing to be expressed) a gene which is normally 
silent (i.e, essentially unexpressed) in the cell as 
obtained, increasing or decreasing the expression level 
of a gene, and changing the regulation pattern of a gene 
such that the pattern is different from that in the cell 
as obtained. "Cell as obtained" refers to the cell prior 
to homologous recombination. * 

Also within the scope of the invention is a method 
of using the present DNA construct to alter expression- of 
25 an endogenous IFNA2 gene in a mammalian cell. This 
method includes the steps of (i) introducing the DNA 
construct into the mammalian cell, (ii) maintaining the 
cell under conditions that permit homologous 
recombination to occur between the construct and a 
30 genomic target site homologous to the targeting sequence, 
to produce a homologously recombinant cell; and (iii) 
maintaining the homologously recombinant cell under 
conditions that permit expression of the IFNA2 coding 
sequence under the control of the construct -derived 
35 regulatory sequence. At least a part of the genomic 



20 



WO 99/57292 



PCT/US99/09925 



- 5 - 



target site is 5' to the coding sequence of an endogenous 
IFNA2 gene. That is, the genomic target site can contain 
coding sequence as well as 5' non-coding sequence. 

The invention also features transfected or 
5 infected cells in which the construct has undergone 

homologous recombination with genomic DNA upstream of the 
endogenous ATG initiation codon in one or both alleles of 
the endogenous IFNA2 gene. Such transfected or infected 
cells, also called homologously recombinant cells, have 
10 an altered IFNA2 expression pattern. These cells are 

particularly useful for in vitro IFNA2 production and for 
delivering IFNA2 via gene therapy. Methods of making and 
using such cells are also embraced by the invention. The 
cells can be of vertebrate origin such as mammalian 
15 (e.g., human, non- human primate, cow, pig, horse, goat, 
sheep, cat, dog, rabbit, mouse, guinea pig, hamster, or 
rat) origin. 

The invention further relates to a method of 
producing a mammalian IFNA2 protein in vitro or in vivo 
20 by introducing the above -described construct into the 
genome of a host cell via homologous recombination. Th.e 
homologously recombinant cell is then maintained under 
conditions that allow transcription, translation, and 
optionally, secretion of the IFNA2 protein. 
25 The invention also features isolated nucleic acids 

comprising a sequence of at least 20 (e.g., at least 30, 
50, 100, 200, or 1000) contiguous nucleotides of SEQ ID 
NO: 12 or its complement, or of a sequence identical to 
SEQ ID NO: 12 except for polymorphic variations or other 
30 minor variations (e.g., less than 5% of the sequence) 
which do not prevent homologous recombination with the 
target sequence. For instance, the isolated DNA can 
contain at least 20 (e.g., at least 30, 50, or 100) 
contiguous nucleotides of SEQ ID NO: 7 or its complement, 
35 at least 20 (e.g., at least 30 or 50) contiguous 
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nucleotides of SEQ ID NO: 8 or its complement, at least 20 
(e.g., at least 30, 50, 100, or 1000) contiguous 
nucleotides of SEQ ID N0:13 or its complement, at least 
20 (e.g., at least 30, 50, or 100) contiguous nucleotides 
5 of SEQ ID NO: 16 or its complement, at least 20 contiguous 
nucleotides of SEQ ID NO: 17 or its complement, at least 
20 (e.g., at least 30 or 50) contiguous nucleotides of 
SEQ ID NO:18 or its complement, or at least 20 (e.g., at 
least 30, 50, 100, or 1000) contiguous nucleotides of SEQ 
10 ID NO: 19 or its complement. - 

In one embodiment, the isolated nucleic acid of 
the invention includes a contiguous 100 bp block of SEQ 
ID NO: 12. For example, the isolated DNA can contain 
nucleotides 1 to 100, 101 to 200, 201 to 300, 301 to 400, 
15 401 to 500, 501 to 600, 601 to 700, 701 to 800, 801 to 
900, 901 to 1000, 1001 to 1100, 1101 to 1200, 1201 to 
1300, 1301 to 1400, 1401 to 1500, 1501 to 1600, 1601 to 
1700, 1701 to 1800, 1801 to 1900, 1901 to 2000, 2001 to 
2100, 2101 to 2200, 2201 to 2300, 2301 to 2400, 2401 to 
20 2500, 2501 to 2600, 2601 to 2700, 2701 to 2800, 2801 to 
2900, 2901 to 3000, 3001 to 3100, 3101 to 3200, 3201 to 
3300, 3301 to 3400, 3401 to 3500, or 3465 to 3564 of SEQ 
ID NO: 12 or its complement. These blocks of SEQ ID NO: 12 
or its complement are also useful as targeting sequences 
25 in the constructs of the invention. 

In the isolated DNA, the SEQ ID NO: 12 -derived 
sequence is not linked to a sequence encoding intact 
IFNA2 , or at least is not linked in the same 
configuration (i.e., separated by the same noncoding 
0 sequence) as occurs in any wild- type genome. The term 
"isolated DNA", as used herein, thus does not denote a 
chromosome or a large piece of genomic DNA (as might be 
incorporated into a cosmid or yeast artificial 
chromosome) that includes not only part or all of SEQ ID 
5 NO: 12, but also an intact IFNA2-coding sequence and all 
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of the sequence which lies between the IFNA2 coding 
sequence and the sequence corresponding to SEQ lb NO: 12 
as it exists in the genome of a cell. it does include, 
but is not limited to, a DNA (i) which is incorporated' 
5 into a plasmid or virus; or (ii) which exists as a 

separate molecule independent of other sequences, e.g., a 
fragment produced by polymerase chain reaction ("PCR") or 
restriction endonuclease treatment. The isolated DNA 
preferably does not contain a sequence which encodes 

10 intact IFNA2 precursor (i.e., IFNA2 complete with its 
endogenous secretion signal peptide) . 

The invention also includes isolated DNA 
comprising a strand which contains a sequence that is at 
leasr :oo (e.g., at least 200, 400, or 1000) nucleotides 

15 m length and that hybridizes' under either moderately 
Btrinnent or highly stringent conditions with SEQ ID 
NO: 7. t. 12, 13, 16, 17, 18, and/or 19, or the complement 
of SE3 ID NO:7, 8, 12, 13, 16, 17, 18, and/or 19. The 
sequence is not linked to an IFNA2-coding sequence, or at 

20 least :c not linked in the same configuration as occurs 
in any wild-type genome. By moderately stringent 
conditions is meant hybridization at 50°C in Church buffer 
(7". rr. J. 0.5% NaHP0 4/ 1 M EDTA, 1% bovine serum albumin) 
anJ * lumng at 50°C in 2X SSC. Highly stringent 

25 conditions are defined as: hybridization at 42°C in the 
prrr.-r.~e of 50% formamide; a first wash at 65°C with 2X 
SC~ r.r..:ning 1% SDS; followed by a second wash at 65°C 

w:t i. ■ : :•: ssc. 

/.'.so embraced by the invention is isolated DNA 

30 cor-- ;-j a strand which contains a sequence that (1) is 
at SO (e.g., at least 70 or 100) nucleotides in 

len :• : AZX d (2) shares at least 80% (e.g., at least 85%, 
90*. or 98%) sequence identity with a fragment or 

all r :;eq id NO: 12, or with the complement of the 

35 f ra-irry-nt. . This fragment can include, for instance, a 
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part or all of SEQ ID NO:7, 8, 13, 16, 17, 18, or 19. 
The sequence is not linked to an intact IFNA2 -coding 
sequence, or at least is not linked in the same 
configuration as occurs in any wild-type genome. 
5 Where a particular polypeptide or nucleic acid 

molecule is said to have a specific percent identity or 
conservation to a reference polypeptide or nucleic acid 
molecule, the percent identity or conservation is 
determined by the algorithm of Myers and Miller, CABIOS 

10 (1989), which is embodied in the ALIGN program (version 
2.0), or its equivalent, using a gap length penalty of 12 
and a gap penalty of 4 where such parameters are 
required. All other parameters are set to their default 
positions. Access to ALIGN is readily available. See, 

15 e.g., http://www2.igh, cnrs.fr/bin/align-guess.cgi on the 
Internet . 

The invention also features a method of delivering 
IFNA2 to an animal (e.g., a mammal such as a human, non- 
human primate, cow, pig, horse, goat, sheep, cat, dog, 

20 rabbit, mouse, guinea pig, hamster, or rat) by providing 
a cell whose endogenous IFNA2 gene has been activated as 
described herein, and implanting the cell in the animal*, 
where the cell secretes IFNA2 . Also included in the 
invention is a method of producing IFNA2 by providing a 

25 cell whose endogenous IFNA2 gene has been activated as 
described herein, and culturing the cell in vitro under 
conditions which permit the cell to express and secrete 
IFNA2 . 

The invention further includes isolated DNA that 
30 shares at least 80% (e.g., at least 85%, 90%, or 95%) 
sequence identity, or hybridizes under highly or 
moderately stringent conditions, with a portion (e.g., at 
least about 20, 50, 100, 400, or 1000 bp in length) of 
the Hindlll-BamHI insert of plasmid pA2HB (described 
35 below). The 3' end of this portion of the insert is at 
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least 511 bp upstream of the ATG translation initiation 
codon of the IFNA2 -coding sequence included in the 
plasmid insert . 

The isolated DNA of the invention can be used, for 
5 example, as a source of an upstream PCR primer for use 
(when combined with a suitable downstream primer) in 
obtaining the regulatory region and/or complete coding 
sequence of an endogenous IFNA2 gene, or as a 
hybridization probe for indicating the presence of 

10 chromosome 9 in a preparation of human chromosomes. It 
can also be used, as described below, in a method for 
altering the expression of an endogenous IFNA2 gene in a 
vertebrate cell . 

Unless otherwise defined, all technical and 

15 scientific terms used herein .have the same meaning as 
commonly understood by one of ordinary skill in the art 
to which this invention belongs. ■ Exemplary methods and 
materials are described below, although methods and 
materials similar or equivalent to those described herein 

20 can also be used in the practice or testing of the 
present invention. All publications, patent 
applications, patents, and other references mentioned 
herein are incorporated by reference in their entirety. 
In case of conflict, the present specification, including 

25 definitions, will control. The materials, methods, and 
examples are illustrative only and not intended to be 
limiting. 

Other features and advantages of the invention 
will be apparent from the following detailed description, 
30 and from the claims. 

Brief Description of the Drawings 
Fig. 1 is a representation of the published 
sequence (SEQ ID N0:1) of a human IFNA2-coding sequence 
and some flanking 5' and 3' non-coding sequences (GenBank 
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HUMIFNAA) . Sequences o£ PGR primers lm , 

and IFN7 are indicated by arrows. . v . 

IFNA2 Fl9 ' 2 " * SChemaCic dia 9™n *°»ing the human 
IFKA2 9 eno mi c region encompassed by the insert o£ plaS mid 

Fig. 3 is a representation of the nucleotide 
sequence (SEQ id NO:7, o£ a region upstream o£ the coding 
sequence of a human IFNA2 gene. This nucleotide sequence 
has not been reported previously. 

MO s , ?\ * " 3 represenC « ion «« a. sequence (SEQ id 
NO:*, o a human^IEN^-coding sequence and some flanking 

SE o "Tt 9 S r KS - sequence 
IS.. .0 HO:8) has not been previously reported. The 

PO.yp-pt.de sequence (SEQ ID N0:2) encoded by this gene 

= *.« shown. The N-terminus of the mature polypeptide 
12 indicated by "Mature." y 

Fig. 5 is a schematic diagram showing a construct 
o. invention. The construct contains a first 

Ur? ; :;r * 3 se ^ ence (1>; an amplifiable marker gene (AM) • 
a .actable marker gene (SM) , a regulatory sequence; a' 
C~ ::te; an exon; a splice-donor site (a,,, an intron; a 
splice-acceptor site (SA) ; and a second targeting 

::r:r ; 2 i- The biack boxes ™ ent and 

-ppled boxes represent transcribed but untranslated 
res . 

Fxg. 6 is a representation of a sequence (SEQ id 
ci a human genomic sequence 5' to the IFNA2 coding 
SC :; '"'"' ' including some coding sequence. The 



s 
s 

the 
25 seuj 



(-4074 to -511; SEQ ID NO: 12) is new 
quence is SEQ ID NO: 13. The sequence 5' to~SEQ 



, n Ur ' 3 - : sequence is previously reported while the 

to :•• 1S SEQ ID N0:7. The sequence between the'framed 

Iej che underlined sequence is SEQ ID NO- 8 

NCCU-.U., -4074 to -3270 is SEQ ID »,„. Nucleotides 
-3-- t s -3239 is SEQ ID No= 17. Nucleotides -3241 to - 
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3137 is SEQ ID NO:18. Nucleotides -3139 to -511 is SEQ 
ID NO: 19. ' 

Fig. 7 is a representation of a first targeting 
sequence (SEQ ID NO: 14) used in a construct of the 
5 invention. 

Fig. 8 is a representation of a ; second targeting 
sequence (SEQ ID NO: 15) used in a construct of the 
invention. 



Detailed Description ' 
10 The present invention is based on the discovery of 

the nucleotide composition of sequences upstream to the 
coding sequence of a human IFNA2 gene. 

Interferon-a constitutes a complex gene family 
with 14 genes clustered on the short arm of chromosome 9. 
15 None of these genes, including the IFNA2 gene, have 

introns. Interf eron-or is produced by macrophages, T and 
B cells, and a variety of many other cells. Interf eron-a 
has considerable antiviral effects, and has been shown to 
be efficacious in treating infections by papilloma virus, 
20 hepatitis B and C viruses, vaccinia, herpes simplex 

virus, herpes zoster varicellosus virus, and rhinovirus! 

The human IFNA2 gene encodes a 188 amino acid 
precursor protein : (SEQ ID NO:2) containing a 23 amino 
acid signal peptide. The genomic map of the human IFNA2 
25 gene is shown in Fig. 1. The map is constructed based on 
1,733 base pair ("bp") published sequences (HUMIFNAA, 
GenBank accession number J00207 and V00544; SEQ id NO:l) 
which begin at position -510 relative to the 
translational start site (unless otherwise specified, all 
30 positions referred to herein are relative to the 

translational start site), and end at position +1,223. 
The cap site is located at position -67. 



WO 99/57292 



PCT/US99/09925 



15 

NO : 4 ) 



5 



- 12 - 

Specific Secp iPnr^o k> to a TVhn ' - , 

A ltering Endogenous IFNA2 Gen e r^,,^. ~ 

To obtain genomic DNA containing sequence nostra 
to an IF.A2 gene, a human leuxocyte genomic nJa^ 

it: 3 E 3?L (clo : tech cataiog # HLiooed) ~ s — 

with a 332 bp probe generated by PGR. ; This probe 
corresponds to the genomic region between positions - 263 
and +69 and was amplified from human genomic DNA using 
oligonucleotide primers designated IFN7 and I FN6/ bot ^ of 
which were designed from the available' IFNA2 genomi » 
sequence (Pig. 1} . The 5 , end q£ ^ ^^l^L 
to position -263, and the primer's sequence is reSP ° ndS 
5-AGTTTCTAAAAAGGCTCTGGGGTA-3' (SEQIDN0:3). The 5 < end 
Of primer IFN6 corresponds to position + 69, and this 

primer sequence is 5 ' -GCCCACAGAGCAGCTTGAC-3 1 (SEQ ID 
NO : 4 ) . 

Approximately one million recombinant phage were 
screened with the radiolabeled 332 bp probe. Sixty 

plateT HZ'* iSOlatSd ™ing 

Plates. Lambda phage DNA was isolated from thirty of 

these plaques and subjected to PGR assay using 
oligonucleotide primers IPN1 and IFN2 . Both IFNl and * 
IFN2 are derived from the 3' untranslated region of the 
XPNA2 gene; their sequences can be found at the website 
http://www.ncbi.nlm.nih.gov/dbSTS," using the 
identification code »NCBI_ID , 42433 . - The 5' end of 
primer IFNl corresponds to position + 639, and the 

ID^r^ S T nCS " "AAAGACTCATGTTTCTGCTATGACC-3 ' (SEQ 
ID N0:5, . The 5' end of primer IFN2 corresponds to 
position + 853, and the primer's sequence is 5<- 
GGTGCACATGACATAATATGAACA-3 ' (SEQ ID NO:S). 0 f the thirty 
Phage samples, two generated the expected 215 bp PGR 
product, one of the two phage plaques was further 
purified by two additional rounds of hybridization 
screening, yielding phage clone 4-4-1. 
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A 8.3 kb Hindlll-BamHI fragment from phage 4-4-1 
was subcloned into pBluescript II SK+ (Stratagene, La 
Jolla, CA) to produce pA2HB, which contains approximately 
4.3 kb of untranscribed upstream sequences, the protein- 
5 coding region (1.1 kb) , and approximately 2.8 kb of 
downstream sequences of the IFNA2 gene. : A restriction 
map of the 8.3 kb Hindlll-BamHI fragment is shown in Fig. 
2. 

The pA2HB plasmid was sequenced by the Sanger 
10 method. A 278 bp sequence whose 5' terminus is at the 5' 
end Hindlll site is shown below (see also Fig. 3) : 

AAGCTTTTATAGGTGTAAATTTTCCACTTAGTACTGCTTTTG 
TAATGTTGTCTTTTTATTTTCATTTATCTCAAGATGTTTTCT 
AATTTCTCTTGACTTCCTTCTTAAATTCTTACCTCATGTAGA 
1 5 CATACATTTTTGGCCCTATGCATTGGGATGCAAAACCAGACT 
AATTTACTTTGTACAAAAAGAAAAATGAGAAAGAAATATATT 
TGGTCTTGTGAGCACTATATGGAAATACTTTATATTCCATTT 
GTTTCATCATATTCATATATCCCTTT (SEQ ID NO: 7) 

The Hindi I I site is located at position -4,073. A 

20 previously unpublished sequence between positions -583 

and -511 was also determined, as shown below and as 

underlined in Fig. 4. 

CATTGGATACTCCATCACCTGCTGTGATATTATGAATGTCTG * 
CCTATATAAATATTCACTATTCCATAACACA (SEQ ID 
25 NO:8) 

The sequence (SEQ ID NO: 13) between the regions 

corresponding to SEQ ID NOs:7 and 8 was also determined. 

The genomic sequence between positions -4,074 and 

-511 (SEQ ID NO: 12) is the sequence which is not 

30 underlined in Fig. 6. SEQ ID N0s:7 and 8 correspond to 
nucleotides 1-278 and nucleotides 3492-3564 of SEQ ID 
NO : 12 , respectively . 

To alter the expression of an endogenous INFA2 
gene, a DNA fragment containing nucleotides 279-3311 of 

35 SEQ ID NO: 12 was cloned into a plasmid to produce 

targeting construct pGA402. Nucleotides 279-3311 of SEQ 
ID NO: 12 was designated SEQ ID NO: 14. The fragment was 
inserted upstream of a CMV promoter and a neomycin 
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resistance gene and is schematically represented in Fig. 
5. For the second targeting sequence of Fig. 5> a De- 
fragment containing nucleotides -68 to 69 of the IFNA2 
gene sequence shown in Fig. l was cloned downstream of 
5 the CMV promoter and neomycin resistance gene. 

Nucleotides -68 to 69 of the IFNA2 gene was designated 
SEQ ID NO: 15. The pGA402 plasmid was introduced into 
human fibroblast cells exhibiting little or no INFA2 gene 
expression, to allow homologous recombination with the 

10 endogenous INFA2 gene. Cells, resistant to G418 after 

plasmid introduction were screened to identify cells with 
increased INFA2 gene expression, as would be expected if 
a homologous recombination event between pGA402 and the 
genomic DNA took place in the vicinity of the endogenous 

15 INFA2 gene. 

General Methodologies 

Alteration of Endogenous IFNA2 Expression 

Using the above -described IFNA2 upstream 
sequences, one can alter the expression of an endogenous 

2 0 human IFNA2 gene by a method as generally described in 
U.S. Patent » 
No. 5,641,670. One strategy is shown in Fig. 5. In this 
strategy, a targeting construct is designed to include a 
first targeting sequence homologous to a first target 

25 site upstream of the gene, an amplifiable marker gene, a 
selectable marker gene, a regulatory region, a CAP site, 
an exon, a splice-donor site, an intron, a splice- 
acceptor site, and a second targeting sequence homologous 
to a second target site downstream of the first target 

30 site and terminating either within or upstream of the 
IFNA2 -coding sequence. According to this strategy, the 
5' end of the second target site is preferably less than 
107 bp upstream of the normal IFNA2 translational 
initiation site, in order to avoid undesired ATG start 
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codons within the transcribed sequence. A transcript 
produced from the homologously recombined locus will 
include the construct -derived exon, the construct -derived 
splice-donor site, the construct -derived intron, the 
5 construct -derived splice-acceptor site, any sequence 
between any of those elements, and the sequence from the 
construct derived splice acceptor site through the entire 
endogenous coding sequence to the transcription 
termination site of the IFNA2 gene. Splicing of this 

10 transcript will generate a mRNA which can be translated 
to produce a precursor of human IFNA2, having either the 
normal IFNA2 secretion signal sequence or a genetically 
engineered secretion signal sequence, depending on the 
characteristics of the construct -derived exon. The size 

15 of the exogenous intron and thus the position of the 

exogenous regulatory region relative to the coding region 
of the gene can be varied to optimize the function of the 
regulatory region. 

In any activation strategy, the first and second 

20 target sites need not be immediately adjacent or even be 
near each other. When they are not immediately adjacent 
to each other, a portion of the IFNA2 gene's normal 
upstream region and/or a portion of the coding region 
would be deleted upon homologous recombination. 

25 Mutations that facilitate alteration of endogenous 

IFNA2 expression may be introduced into the chromosomal 
DNA via homologous recombination. For instance, it may 
be desirable to abolish a spurious and undesired ATG 
initiation codon upstream of the correct ATG initiation 

30 codon and between the exogenous regulatory region and the 
endogenous IFNA2 coding region in the homologously 
recombined locus. To do so, one can employ a targeting 
construct having a targeting sequence homologous to a 
genomic site that spans the undesired ATG initiation 
35 codon. This targeting sequence contains nucleotides that 



WO 99/57292 



PCT/US99/09925 



- 16 



correspond to the desired mutation, e.g., contains ATT 
instead of ATG. The targeting construct optionally 
includes one or more selectable markers to facilitate 
selection of homologously recombined cells. An exogenous 
5 regulatory region can then be introduced to the 

homologously recombined cells upstream.of the altered 
sites, using the expression alteration method of the 
invention. 

Alternatively, the exogenous regulatory region and 
10 the desired sequence mutation Cs) may be introduced into 
the genomic DNA in a single step. The DNA construct used 
in this embodiment may contain both the exogenous 
regulatory region and a targeting sequence that contains 
nucleotides corresponding to the desired mutation (s). 
15 One may also co-transfect or /co-infect two separate 
constructs into target cells, with one construct 
containing the regulatory region and the other containing 
nucleotides corresponding to the desired mutation. 

If desired, a mammalian splice-acceptor site may 
20 be introduced into the genomic DNA, e.g., at a site 

between an undesired ATG initiation codon and the correct 
ATG initiation codon, in a similar manner. The DNA 
construct used for this purpose contains a targeting 
sequence homologous to a genomic site upstream of the • 
25 correct INFA2 initiation codon, and adjacent to or 
embedded within the homologous sequence, a sequence 
corresponding to the desired splice-acceptor site. Cells 
containing the correctly recombined IFNA2 locus are then 
transfected or infected with a second construct 
30 containing an exogenous regulatory region and an exon 
with an unpaired splice-donor site at its 3' end, 
together with targeting sequence (s) which target the 
second construct to a genomic region upstream of the 
inserted splice-acceptor site. A primary transcript 
35 produced under the control of the exogenous regulatory 
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region will include the exogenous exon, the exogenous 
splice-donor site, the exogenous splice-acceptor" site, 
any sequences between those elements, and the sequence 
between the exogenous splice-acceptor site and the 
5 transcriptional termination site of the endogenous IFNA2 
gene. Upon splicing, the splice-donor site of the 
transcript will be spliced to the splice-acceptor site, 
and the intervening intronic RNA, which may contain 
undesirable AUG initiation codons, will be removed. Any 

10 problems associated with having a transcript with 

undesired AUG translation initiation codons between the 
transcription start site and the IFNA2 -coding sequence 
are thereby avoided. Of course, the regulatory region, 
exon, splice donor site, and splice-acceptor site can 

15 instead be introduced in a single step. The DNA ■' 

construct used in this embodiment contains a regulatory 
region, an exon, a splice-donor site, an intron, a 
splice-acceptor site, a targeting sequence homologous to 
a genomic site between the correct INFA2 initiation codon 

20 and the undesired ATG codon, and optionally, one or more 
selectable markers. Alternatively, two separate 
targeting constructs may be useful, with one containing* 
the regulatory region, the exon, and the splice-donor 
site, and the other containing the splice-acceptor site. 

25 The two constructs can be introduced into target cells in 
a single step. 
The DNA Construct 

The DNA construct of the invention includes at 
least a targeting sequence and a regulatory sequence. It 

30 can additionally include an exon; or an exon and splice 
donor site; or an exon, a splice-donor site, an intron, 
and a splice-acceptor site. In the construct, the exon, 
if present, is 3' of the regulatory sequence, and the 
splice-donor site, if present, is at the 3' end of the 

35 exon. The intron and splice acceptor site, if present, 
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are 3' of the splice donor site. m addition, there can 
be multiple exons and introns (with appropriate* splice 
donor and acceptor sites) in the construct. The DNA in 
the construct is referred to as exogenous, since the DNA 
is not an original part of the genome of a host cell 
Exogenous DNA may possess sequences identical to or dif 
ferent from portions of the endogenous genomic DNA 
present in the cell prior to transfection or infection by 
viral vector. As used herein, "transfection- means 
introduction of a plasmid into a cell by nonviral (e g 
chemical or physical) means, such as calcium phosphate- or 
calcium chloride co-precipitation, DEAE-dextran-mediated 
transfection, lipofection, elect roporat ion, 
microinjection, microprojectiles, or biolistic-mediated 
15 uptake. "infection" means introduction of a viral vector 
xnto a cell by viral infection. Various elements 
included in the DNA construct of the invention are 
described in detail below. 

The DNA construct can also include cis-acting or 
20 trans-acting viral sequences (e.g., packaging signals) 
thereby enabling delivery of the construct into the 
nucleus of a cell via infection by a viral vector, where 
necessary, the DNA construct can be disengaged from 
various steps of a virus life cycle, such as integrase- 
25 mediated integration in retroviruses or episome 

maintenance. Disengagement can be accomplished by 
appropriate deletions or mutations of viral sequences 
such as a deletion of the integrase coding region in a 
retrovirus vector. Additional details regarding the 
30 construction and use of viral vectors are found in 
Robbins et al., Pharmacol. Ther. 80:35-47, 1998; and 
Gunzburg et al., Mol. Med. Today 1:410-417, 1995, herein 
incorporated by reference. 
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Targeting Sequences 

Targeting sequences permit homologous 
recombination of a desired sequence into a selected site 
in the host genome. Targeting sequences are homologous 
5 to (i.e., able to homologous ly recombine with) their 
respective target sites in the host genome. 

A circular DNA construct can employ a single 
targeting sequence, or two or more separate targeting 
sequences. A linear DNA construct may contain two or 

10 more separate targeting sequences. The- target site to 
which a given targeting sequence is homologous can reside 
within the coding region of the IFNA2 gene, upstream of 
and immediately adjacent to the coding region, or 
upstream of and at a distance from the coding region. 

!5 The first of the two targeting sequences in the 

construct (or the entire targeting sequence, if there is 
only one targeting sequence in the construct) is derived 
from the newly disclosed genomic regions upstream of the 
IFNA2-coding sequence. This targeting sequence contains 

20 a portion (e.g., 20 or more contiguous nucleotides) of 
SEQ ID NO: 12, e.g., a portion of SEQ ID NO: 7, 8, or 13 . 

The second of the two targeting sequences in the 
construct may target a genomic region upstream of the 
coding sequence or target part or all of the coding 

25 sequence itself. By way of example, the second targeting 
sequence may contain, at its 3' end, an "exogenous" 
coding region identical to the first few codons of the 
IFNA2 coding sequences. Upon homologous recombination, 
the exogenous coding region recombines with the targeted 

30 part of the endogenous IFNA2-coding sequence. If 
desired, the exogenous coding region may encode a 
heterologous amino acid sequence, so long as the 
exogenous coding region remains sufficiently homologous 
to the endogenous coding region it replaces to permit 

35 homologous recombination. 
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The targeting sequence may additionally include 
sequence derived from a previously disclosed region of 
the IFNA2 gene, including those described herein, as well 
as a region further upstream which is structurally 
5 uncharacterized but can be mapped by one skilled in the 
art . 

Genomic fragments useful as targeting sequences 
can be identified by their ability to hybridize to a 
probe containing all or a portion of SEQ ID NO: 12. Such 
10 a probe can be generated by PCR using primers derived 
from SEQ ID NO: 12. 

The Regulatory Sequence 

The regulatory sequence of the DNA construct can 
contain one or more promoters (e.g., a constitutive, 
15 tissue-specific, or inducible' promoter) , enhancers, 

scaffold-attachment regions or matrix attachment sites, 
negative regulatory elements, transcription factor 
binding sites, or combinations of these elements. 

The regulatory sequence can be derived from a 
20 eukaryotic (e.g., mammalian) or viral genome. Useful 
regulatory sequences include, but are not limited to, 
those that regulate the expression of SV40 early or late 
genes, cytomegalovirus genes, and adenovirus major late 
genes. They also include regulatory regions derived from 
25 genes encoding mouse metallothionein-I, elongation fac- 
tor-la, collagen (e.g., collagen lal, collagen Ia2, and 
collagen IV), actin (e.g., T -actin) , immunoglobulin, 
HMG-CoA reductase, glyceraldehyde phosphate 
dehydrogenase, 3-phosphoglyceratekinase, collagenase, 
stromelysin, fibronectin, vimentin, plasminogen activator 
inhibitor I, thymosin 04, tissue inhibitors of 
metalloproteinase, ribosomal proteins, major 
histocompatibility complex molecules, and human leukocyte 
antigens. 



30 
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The regulatory sequence preferably contains a 
transcription factor binding site, such as a TATA Box, 
CCAAT Box, API, Spl, or a NF-kB binding site. 

Marker Genes 

5 If desired, the construct can include a sequence 

encoding a desired polypeptide, operatively linked to its 
own promoter. An example of this would be a selectable 
marker gene, which can be used to facilitate the 
identification of a targeting event. An amplifiable 

10 marker gene can also be included and used to facilitate 
selection of cells having co-amplified flanking DNA 
sequences. Cells containing amplified copies of the 
amplifiable marker gene can be identified by growth in 
the presence of an agent that selects for the expression 

15 of the amplifiable gene. The* activated endogenous IFNA2 
gene will be amplified in tandem with the amplified 
selectable marker gene. Cells containing multiple copies 
of the activated endogenous gene may produce very high 
levels of IFNA2 and are thus useful for in vitro protein 

2 0 production and gene therapy. 

The selectable and amplifiable marker genes do ijot 
have to lie immediately adjacent to each other. The 
amplifiable marker gene and selectable marker gene can be 
the same gene. One or both of the marker genes can be 

25 situated in the intron of the DNA construct. Suitable 
amplifiable marker genes and selectable marker genes are 
described in U.S. Patent No. 5,641,670. 

The Splice-Donor and Splice-Acceptor Sites 
The DNA construct may further contain an exon, a 
30 splice-donor site at the 3' end of the exon, an intron, 
and a splice-acceptor site. 

A splice-donor site is a sequence which directs 
the splicing of one exon of an RNA transcript to the 
splice-acceptor site of another exon of the transcript, 
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resulting in removal of the intron between the two sites 
Really, the first exon lies 5' of the second,^, and 
the splxce-donor site located at the 3' end of the first 
exon is paired with a splice-acceptor site flanking the 
second exon on the 5' side of the second exon. Splice- 
donor sites have a characteristic consensus sequence 
represented as (A/C) AGGURAGU (where R denotes a purine 
nucleotide), with the GU in the fourth and fifth 
positions being required (Jackson, Nucleic Acids Research 

splice-donor consensus site are the last three bases of 
the exon: i.e., they are not spliced out. Splice-donor 
sites are functionally defined by their ability to effect 
C.e appropriate reaction within the mRNA splicina 
15 patnway. - a 

A splice acceptor site in a construct of the 
invention directs, in conjunction with a splice donor 
s:tv. the splicing of one exon to another exon. Sp i ice _ 
-center sites have a characteristic sequence represented 
20 a, .V,..AG (SEQ ID NO:10), where Y denotes any 

pyridine and N denotes any nucleotide (Jackson, Nucleic 
Ac— a F.c-search 19:3715-3798, 1991). » 
The cap .q-ii-a 

The DNA construct can optionally contain a CAP 
- A CAP site is a specific transcription start site 

i a associated with and utilized by the regulatory 
This CAP site is located at a position relative 
regulatory sequence in the construct such that 
••' ! homologous recombination, the regulatory 
- : - " • directs synthesis of a transcript that begins at 



rr : ; •: 



Alternatively, no CAP site is included in 
struct, and the transcriptional apparatus will 
: / default an appropriate site in the targeted 
ger.- - be utilized as a CAP site 
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Additional DNA elements 

The construct may additionally contain sequences 
which affect the structure or stability of the RNA or 
protein produced by homologous recombination. 
5 Optionally, the DNA construct can include a bacterial 

origin of replication and bacterial antibiotic resistance 
markers or other selectable markers, which allow for 
large-scale plasmid propagation in bacteria or any other 
suitable cloning/host system. 
10 Al1 of the above-described elements of the DNA 

construct are operatively linked or functionally placed 
with respect to each other. That is, upon homologous 
recombination between the construct and the targeted 
genomic DNA, the regulatory sequence can direct the 
15 production of a primary RNA transcript which initiates at 
a CAP site (optionally included in the construct) and 
includes the sequence lying between the CAP site and the 
endogenous IFNA2 gene's transcription stop site. 
Depending on the location of the CAP site, a portion of 
20 this sequence may include the IFNA2 gene's endogenous 
regulatory region as well as sequences neighboring that, 
region that are normally not transcribed. If an exon, a 
splice-donor site and a splice-acceptor site are present 
in the construct, the primary transcript will also 
25 include the exon, the two splice sites, and the intron 
between the two sites. 

The order of elements in the DNA construct can 
vary. where the construct is a circular plasmid or viral 
vector, the relative order of elements in the resulting 
30 structure can be, for example: a targeting sequence, 
plasmid DNA (comprised of sequences used for the 
selection and/or replication of the targeting plasmid in 
a microbial or other suitable host) , selectable 
marker(s), a regulatory sequence, an exon, a splice-donor 
35 site, an intron, and a splice-acceptor site. 
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Where the construct is linear, the order can be 
for example: a first targeting sequence, a selectable ' 
marker gene, a regulatory sequence, .an exon, a splice- 
donor site, an intron, a splice-acceptor site, and a 
5 second targeting sequence; or, in the alternative a 
first targeting sequence, a regulatory .sequence, an exon 
a splice-donor site, an intron, a splice-acceptor site a 
selectable marker gene, optionally an internal ribosomal 
entry site, and a second targeting sequence. 
10 Alternatively, the order can be . : a first 

targeting sequence, a first selectable marker gene a 
regulatory sequence, an exon, a splice-donor site, 'an 
intron, a splice-acceptor site, a second targeting 
sequence, and a second selectable marker gene- or a 
15 first targeting sequence, a regulatory sequence, an exon 
a splice-donor site, an intron, a splice-acceptor site a 
first selectable marker gene, a second targeting se- 
quence, and a second selectable marker gene. 
Recombination between the targeting sequences flanking 
20 the first selectable marker with homologous sequences in 
the host genome results in the targeted integration of 
the first selectable marker, while the second selectable 
marker is not integrated. Desired transfected or 
infected cells are those that are stably transfected or 
5 infected with the first, but not second, selectable 
marker. Such cells can be selected for by growth in a- 
medium containing an agent which selects for expression 
of the first marker and another agent which selects 
against the second marker. Transfected or infected cells 
0 that have improperly integrated the targeting construct 
by a mechanism other than homologous recombination would 
be expected to express the second marker gene and will 
thereby be killed in the selection medium. 

A positively selectable marker is sometimes 
included in the construct to allow for the selection of 
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cells containing amplified copies of that marker, in 
this embodiment, the order of construct components can 
be, for example: a first targeting sequence, an 
amplifiable positively selectable marker, a second 
5 selectable marker (optional), a regulatory sequence, an 
exon, a splice-donor site, an intron, a splice-acceptor 
site, and a second targeting DNA sequence. 

The various elements of the construct can be 
obtained from natural sources (e.g., genomic DNA), or can 

10 be produced using genetic engineering techniques or 
synthetic processes. The regulatory region, CAP site, 
splice -donor site, intron, and splice acceptor site of 
the construct can be isolated as a complete unit from, 
e.g., the human elongation factor-la (Genbank sequence 

15 HUMEF1A) gene or the cytomegalovirus (Genbank sequence 
HEHCMVP1) immediate early region. These components can 
also be isolated from separate, genes . 

Transfectio n or Infection and Homologous Recombination 
The DNA construct of the invention can be intro- 

2 0 duced into the cell, such as a primary, secondary, or 

immortalized cell, as a single DNA construct, or as sepa- 
rate DNA sequences which become incorporated into the 
chromosomal or nuclear DNA of a transfected or infected 
cell. By "transfected cell" is meant a cell into which 
25 (or into an ancestor of which) a DNA or RNA molecule has 
been introduced by a means other than using a viral 
vector. By "infected cell" is meant a cell into which 
(or into an ancestor of which) a DNA or RNA molecule has 
been introduced using a viral vector. Viruses known to 

3 0 be useful as vectors include adenovirus, adeno-associated 

virus, Herpes virus, mumps virus, poliovirus, lentivirus, 
retroviruses, Sindbis virus, and vaccinia viruses such as 
canary pox virus. The DNA can be introduced as a linear, 
double- stranded (with or without single-stranded regions 
35 at one or both ends) , single-stranded, or circular 
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molecule. When the construct is introduced into host 
cells in two separate DNA fragments, the two fragments 
share DNA sequence homology (overlap) at the 3' end of 
one fragment and the 5' end of the other, while one 
5 carries a first targeting sequence and the other carries 
a second targeting sequence. Upon introduction into a 
cell, the two fragments can undergo homologous 
recombination to form a single molecule with the first 
and second targeting sequences flanking the region of 

10 overlap between the two original fragments. The product 
molecule is then in a form suitable for homologous 
recombination with the cellular target sites. More than 
two fragments can be used, with each of them designed 
such that they will undergo homologous recombination with 

15 each other to ultimately form" a product suitable for 
homologous recombination with the cellular target sites 
as described above. 

The DNA construct of the invention, if not 
containing a selectable marker itself, can be 

20 co-transfected or co-infected with another construct that 
contains such a marker. A targeting plasmid may be 
cleaved with a restriction enzyme at one or more sites to 
create a linear or gapped molecule prior to transfection 
or infection. The resulting free DNA ends increase the 

25 frequency of the desired homologous recombination event. 
In addition, the free DNA ends may be treated with an v 
exonuclease to create overhanging 5' or 3' single- 
stranded DNA ends (e.g., at least 30 nucleotides in 
length, and preferably 100-1000 nucleotides in length) to 

30 increase the frequency of the desired homologous 

recombination event. In this embodiment, homologous 
recombination between the targeting sequence and the 
genomic target will result in two copies of the targeting 
sequences, flanking the elements contained within the 

35 introduced plasmid. 
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The DNA constructs may be transfected into cells 
(preferably in vitro) by a variety of physical or 
chemical methods, including electroporation, microin- 
jection, microprojectile bombardment, calcium phosphate 
5 precipitation, liposome delivery, or polybrene- or DEAE 
dextran-mediated transf ection. 

The transfected or infected cell is maintained 
under conditions which permit homologous recombination, 
as described in the art (see, e.g., Capecchi, Science 

10 24:1288-1292, 1989). When the homologously recombinant 
cell is maintained under conditions sufficient to permit 
transcription of the DNA, the regulatory region 
introduced by the DNA construct will alter transcription 
of the IFNA2 gene. 

15 Homologously recombinant cells (i.e., cells that 

have undergone the desired homologous recombination) can 
be identified by phenotypic screening or by analyzing the 
culture supernatant in enzyme-linked immunosorbent assays 
(ELISA) for IFNA2. Commercial ELISA kits for detecting 

20 IFNA2 are available from Biosource International 

(Camarillo, CA) . Homologously recombinant cells can also 
be identified by Southern and Northern analyses or by 
polymerase chain reaction (PGR) screening. 

As used herein, the term "primary cells 11 includes 

25 (i) cells present in a suspension of cells isolated from 
a vertebrate tissue source (prior to their being plated, 
i.e., attached to a tissue culture substrate such as a 
dish or flask) , (ii) cells present in an explant derived 
from tissue, (iii) cells plated for the first time, and 

30 (iv) cell suspensions derived from these plated cells. 
Primary cells can also be cells as they naturally occur 
within a human or an animal. 

Secondary cells are cells at all subsequent steps 
in culturing. That is, the first time that plated 

35 primary cells are removed from the culture substrate and 
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replated (passaged) , they are referred to herein as 
secondary cells, as are all cells in subsequent passages 
Secondary cell strains consist of secondary cells which 
have been passaged one or more times. Secondary cells 
5 typically exhibit a finite number of mean population 
doublings in culture and the property of 
contact-inhibited, anchorage -dependent growth 
(anchorage-dependence does not apply to cells that are 
propagated in suspension culture) . Primary and secondary 
10 cells are not immortalized 

immortalized cells are cell lines (as opposed to 
cell strains, with the designation "strain" reserved for 
primary and secondary cells) that exhibit an apparently 
unlimited lifespan in culture. 

Cells selected for transf ection or infection can 
fall into four types or categories: (i) cells which do 
not, as obtained, make or contain more than trace amounts 
of the IFNA2 protein, (ii) ce lls which make or contain 
the protein but in quantities other than those desired 
(such as, in quantities less than the level which is 
physiologically normal for the type of cells as 
obtained) , (iii) cells which make the protein at a leve*l 
which is physiologically normal for the type of cells as 
obtained, but are to be augmented or enhanced in their 
content or production, and (iv) cells in which it is 
desirable to change the pattern of regulation or 
induction of a gene encoding the protein. 

Primary, secondary and immortalized cells to be 
transf ected or infected by the present method can be 
obtained from a variety of tissues and include all 
appropriate cell types which can be maintained in 
culture. For example, suitable primary and secondary 
cells include fibroblasts, keratinocytes, epithelial 
cells (e.g., mammary epithelial cells, intestinal 
epithelial cells), endothelial cells, glial cells, neural 



20 
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cells, formed elements of the blood (e.g., lymphocytes, 
bone marrow cells) , muscle cells, and precursors of these 
somatic cell types. Where the homologously recombinant 
cells are to be used in gene therapy, primary cells are 
5 preferably obtained from the individual to whom the 

transfected or infected primary or secondary cells are to 
be administered. However, primary cells can be obtained 
from a donor (i.e., an individual other than the 
recipient) of the same species. 
10 Examples of immortalized human cell lines useful 

for protein production or gene therapy include, but are 
not limited to, 2780AD ovarian carcinoma cells (Van der 
Blick et al., Cancer Res., 48:5927-5932, 1988), A549 
(American Type Culture Collection ("ATCC") CCL 185), BeWo 
15 (ATCC CCL 98), Bowes Melanoma" cells (ATCC CRL 9607), 
CCRF-CEM (ATCC CCL 119), CCRF-HSB-2 (ATCC CCL 120.1), 
COLO201 (ATCC CCL 224), COLO205 (ATCC CCL 222), COLO 
320DM (ATCC CCL 220), COLO 320HSR (ATCC CCL 220.1), Daudi 
cells (ATCC CCL 213), Detroit 562 (ATCC CCL 138), HeLa 
20 cells and derivatives of HeLa cells (ATCC CCL 2, 2.1 and 
2.2), HCT116 (ATCC CCL 247), HL-60 cells (ATCC CCL 240), 
HT1080 cells (ATCC CCL 121), IMR-32 (ATCC CCL 127), 
Jurkat cells (ATCC TIB 152) , K-562 leukemia cells (ATCC 
CCL 243) , KB carcinoma cells (ATCC CCL 17) , KG-1 (ATCC 
25 CCL 246), KG- la (ATCC CCL 246.1), LS123 (ATCC CCL 255), 
LS174T (ATCC CCL CL-188) , LS180 (ATCC CCL CL-187) , MCF-7 
breast cancer cells (ATCC BTH 22), MOLT-4 cells (ATCC CRL 
1582), Namalwa cells (ATCC CRL 1432), NCI-H498 (ATCC CCL 
254), NCI-H508 (ATCC CCL 253), NCI-H548 (ATCC CCL 249), 
30 NCI-H716 (ATCC CCL 251), NCI-H747 (ATCC CCL 252), 

NCI-H1688 (ATCC CCL 257), NCI-H2126 (ATCC CCL 256), Ra j i 
cells (ATCC CCL 86), RD (ATCC CCL 136), RPMI 2650 (ATCC 
CCL 30), RPMI 8226 cells (ATCC CCL 155), SNU-C2A (ATCC 
CCL 250.1), SNU-C2B (ATCC CCL 250), SW-13 (ATCC CCL 105), 
35 SW48 (ATCC CCL 231) , SW403 (ATCC CCL 230) , SW480 (ATCC 
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CCL 227) , SW620 (ATCC CCL 227) , SW837 ,ATCC CCL 235) 
SW948 (ATCC CCL 237) , S„ 1116 (ATCC CCL 233) , SW1417 (ATCC 
CCL 238) , SVU483 (ATCC CCL 234) , T84 (ATCC CCL 248 u 

5 38 3 VA13 ellS hl ' ATCC 1593K " lDr ' ATCC CCL 218 >' — 

38VA13 subline 2R4 cells (ATCC CLL 75.1), as well as 

heterohybridoma cells produced by fusion of human cells 
and cells of another species. Secondary human fibroblast 
strains, such as WI-38 (ATCC CCL 75) and MRC-5 (ATCC CCL 
171), may be used. l„ addition, primary, secondary, or 
10 immortalised human cells, as well as primary, secondary 
or immortalized cells from other species, can be usedTor 
in vitro protein production or gene therapy. 
IFNA2 -E xpressing Cell s 

Homologously recombinant cells of the invention 
15 express IFNA2 at desired levels and are useful for both 
an v it ro production of IFNA2 and gene therapy. 
Protein Production 

Homologously recombinant cells according to this 

20 r7 ent T ^ US6d ^ Vltr ° faction of IFNA2 . 

The cells are maintained under conditions, as described 
in the art, which result in expression of proteins. The 
IFNA2 protein may be purified from cell l ysa tes or cell 
supernatants. A pharmaceutical composition containing 
the IFNA2 protein can be delivered to a human or an .. 
anxmal by conventional pharmaceutical routes known in- the 
art (e.g., oral, intravenous, intramuscular, intranasal, 
pulmonary, transmucosal , intradermal, rectal 
intrathecal, transdermal, subcutaneous, intraperitoneal 
or intralesional) . Oral administration may require use' 
of a strategy for protecting the protein from degradation 
in the gastrointestinal tract: e.g., by encapsulation in 
polymeric microcapsules. 
Gene Therapy 

Homologously recombinant cells of the present 
35 invention are useful as populations of homologously 
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recombinant cell lines, as populations of homologously 
recombinant primary or secondary cells, as homologously 
recombinant clonal cell strains or lines, as homologously 
recombinant heterogenous cell strains or lines, and as 
5 cell mixtures in which at least one representative cell 
of one of the four preceding categories of homologously 
recombinant cells is present. Such cells may be used in 
a delivery system for treating (i) infections caused by 
such viruses as papilloma virus, hepatitis B and C 
10 viruses, vaccinia, herpes simplex virus, herpes zoster 
varicellosus virus, and rhinovirus, and (ii) any other 
conditions treatable with IFNA2. 

Homologously recombinant primary cells, clonal 
cell strains or heterogenous cell strains are 
15 administered to an individual* in whom the abnormal or 
undesirable condition is to be treated or prevented, in 
sufficient quantity and by an appropriate route, to ex- 
press or make available the protein or exogenous DNA at 
physiologically relevant levels. A physiologically rele- 
20 vant level is one which either approximates the level at 
which the product is normally produced in the body or 
results in improvement of the abnormal or undesirable 
condition. If the cells are syngeneic with respect to a 
immunocompetent recipient, the cells can be administered 
25 or implanted intravenously, intraarterially, 

subcutaneously, intraperitoneal ly, intraomentally, 
subrenal capsularly, intrathecally, intracranially, or 
intramuscularly . 

If the cells are not syngeneic and the recipient 
30 is immunocompetent, the homologously recombinant cells to 
be administered can be enclosed in one or more 
semipermeable barrier devices. The permeability 
properties of the device are such that the cells are 
prevented from leaving the device upon implantation into 
35 a subject, but the therapeutic protein is freely 
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permeable and can leave the barrier device and enter the 
local space surrounding the implant- or enter the systemic 
circulation. See, e.g., U.S. Patent Nos. 5,641,670, 
5,470,731, 5,620,883, 5,487,737, and co-owned u!s. Patent 
Application entitled "Delivery of Therapeutic Proteins" 
(inventors: Justin C. Lamsa and Douglas A. Treco) , filed 
April 16, 1999, all herein incorporated by reference. 
The barrier device can be implanted at any appropriate 
site: e.g., intraperitoneally, intrathecally, 
subcutaneously, intramuscularly, within the kidney 
capsule, or within the omentum. 

Barrier devices are particularly useful and allow 
homologously recombinant immortalized cells, homologously 
recombinant cells from another species (homologously 
recombinant xenogeneic cells)', or cells from a nonhisto- 
compatibility-matched donor (homologously recombinant 
allogeneic cells) to be implanted for treatment of a 
subject. The devices retain cells in a fixed position in 
vivo, while protecting the cells from the host's immune 
20 system. Barrier devices also allow convenient short-term 
(i.e., transient) therapy by allowing ready removal of 
the cells when the treatment regimen is to be halted for 
any reason. Transfected or infected xenogeneic and 
allogeneic cells may also be used in the absence of 
25 barrier devices for short-term gene therapy. m that 
case, the IFNA2 produced by the cells will be delivered 
in vivo until the cells are rejected by the host's immune 
system. 

A number of synthetic, semisynthetic, or natural 
30 filtration membranes can be used for this purpose, 
including, but not limited to, cellulose, cellulose 
acetate, nitrocellulose, polysulfone, polyvinyl idene 
difluoride, polyvinyl chloride polymers and polymers of 
polyvinyl chloride derivatives. Barrier devices can be 
35 utilized to allow primary, secondary, or immortalized 
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cells from another species to be used for gene therapy in 
humans . 

Another type of device useful in the gene therapy 
of the invention is an implantable collagen matrix in 
5 which the cells are embedded. Such a device, which can 
contain beads to which the cells attach, is described in 
WO 97/15195, herein incorporated by reference. It can be 
implanted as described above. 

The number of cells needed for a given dose or 

10 implantation depends on several factors, including the 
expression level of the protein,, the size and condition 
of the host animal, and the limitations associaCed with 
the implantation procedure. Usually the number of cells 
irrpiar.ted in an adult human or other similarly-sized 

15 ar.inv.il 13 in the range of 1 X* 10 4 to 5 X 10 10 , and 

preferably 1 X 10 8 to 1 X 10 9 . If desired, they may be 
irrpiar.ted at multiple sites in the patient, either at one 
time cr over a period of months or years. The dosage may 
be repeated as needed. 

20 Dyrv-f. t 

Under the terms of the Budapest Treaty on the 
Internet ional Recognition of the Deposit of 
M:::: - nanisms for the Purpose of Patent Procedure, the 
deposit of plasmid pA2HB was made with the American Type 

25 Culture Collection (ATCC) of Rockville, MD, USA on May 

12. :>>S. The deposit was given Accession Number 209872. 

Applicants' assignee, Transkaryotic Therapies, 
Irv .-'presents that the ATCC is a depository affording 
P'. r.~e of the deposit and ready accessibility thereto 

30 ty ■ : - ; :blic if a patent is granted. All restrictions 
cr. •:.< -vailability to the public of the material so 
der ■:::». i will be irrevocably removed upon the granting 
of • patent. The material will be available during the 
pervic. -.::•/ of the patent application to one determined by 

35 che Ccrrr.issioner to be entitled thereto under 37 CFR 1.14 
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and 35 U.S.C. §122. The deposited material will be 
maintained with all the care necessary to keep ft viable 
and uncontaminated for a period of at least five years 
after the most recent request for the furnishing of a 
5 sample of the deposited material, and in any case, for a 
period of at least thirty (30) years after the date of 
deposit or for the enforceable life of the patent, 
whichever period is longer. Applicants' assignee 
acknowledges its duty to replace the deposit should the 
10 depository be unable to furnish a sample when requested 
due to the condition of the deposit. 

Other Embodiments 
It is to be understood that while the invention 
has been described in conjunction with the detailed 
15 description thereof, the foregoing description is 

intended to illustrate and not to limit the scope of the 
invention, which is defined by the scope of the appended 
claims . 

Other aspects, advantages, and modifications are 
20 within the scope of the following claims. 

> 

What is claimed is: 
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1. A DNA construct that alters expression of an 
endogenous IFNA2 gene in a mammalian cell upon " 
integration into the genome of the cell via homologous 
recombination, the construct comprising (i) a targeting 

5 sequence containing at least 20 contiguous nucleotides of 
SEQ ID NO: 12, and (ii) a transcriptional regulatory 
sequence . 

2. The DNA construct of claim 1, wherein the 
construct further comprises an-exon, a splice-donor site, 

10 an intron and a splice-acceptor site. 

3. The DNA construct of claim 1, wherein the 
construct further comprises a selectable marker gene. 

4. The DNA construct of claim 1, wherein the 
targeting sequence contains at least 50 contiguous 

15 nucleotides of SEQ ID NO: 12. 

5. A DNA construct that alters expression of an 
endogenous IFNA2 gene in a mammalian cell upon 
integration into the genome of the cell via homologous 
recombination, the construct comprising (i) a targeting 

20 sequence containing at least 20 contiguous nucleotides of 
SEQ ID NO: 7, and (ii) a transcriptional regulatory 
sequence . 

6. The DNA construct of claim 5, wherein the 
construct further comprises an exon, a splice-donor site, 

25 an intron and a splice-acceptor site. 
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A DNA construct that alters expression of an 
enoogenous IFNA2 gene in a mammalian cell upon ' 
integration into the genome of the cell via homologous 
recomMnatxon. the construct comprising ( i, . tar J 
sequence containing at least 20 contiguous nucleotides of 
SEO ID NO:8 , and a transcriptional regulatory 

sequence. 1 

8. The DNA construct of claim 7, wherein the 
construct further comprises an axon, a -splice-donor site 
an ir.tron and a splice-acceptor site. 



9 



An isolated nucleic acid comprising at least 
20 contiguous nucleotides of SEQ ID NO: 12 or its 
co^i^cnt, wherein the isolated nucleic acid does not 
encode full-length interferon-a 2. 

, 10 " iS ° lated nUClSiC aCid ° f Claim 9 ' whe ~in 

the elated nucleic acid comprises at least 50 

contxpuoua nucleotides of SEQ ID N0: 12 or its complement. 

U * ^ isolat ^ nucleic acid of claim 9, wherein 
ms.ated nucleic acid comprises at least 100 
cc.,t;.,ous nucleotides of SEQ ID NO:12 or its complement 



12 

ccr.r ; j 



The isolated nucleic acid of claim 9, wherein 
-ated nucleic acid comprises at least 200 
:.us nucleotides of SEQ ID NO: 12 or its complement. 



:3. The isolated nucleic acid of claim 9, wherein 
ated nucleic acid comprises SEQ ID NO: 7 or its 
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14. The isolated nucleic acid of claim 9, wherein 
the isolated nucleic acid comprises SEQ ID N0:8or its 
complement . 

15. The isolated nucleic acid of claim 9, wherein 
5 the isolated nucleic acid comprises SEQ ID NO: 12 or its 

complement . 

16. The isolated nucleic acid of claim 9, wherein 
the isolated nucleic acid comprises at least 20 
contiguous nucleotides of SEQ ID NO: 7 or its complement. 

10 17 • The isolated nucleic acid of claim 9, wherein 

the isolated nucleic acid comprises at least 20 
contiguous nucleotides of SEQ ID NO: 8 or its complement. 

18. An isolated nucleic acid comprising a strand 
that comprises a nucleotide sequence that (i) is at least 

15 100 nucleotides in length and (ii) hybridizes under 
highly stringent conditions with SEQ ID NO: 12, or the 
complement of SEQ ID NO:12. 

» 

19. The isolated nucleic acid of claim 18, 
wherein the nucleotide sequence is at least 2 00 

20 nucleotides in length. 

20. The isolated nucleic acid of claim 18, 
wherein the nucleotide sequence is at least 400 
nucleotides in length. 



21. The isolated nucleic acid of claim 18, 
wherein the nucleotide sequence is at least 1,000 
nucleotides in length. 
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22. An isolated nucleic acid comprising a strand 
that comprises a nucleotide sequence that (i) i s -.at least 
100 nucleotides in length and (ii) hybridizes under 
highly stringent conditions with SEQ ID NO: 7, or the 

5 complement of SEQ ID NO: 7. 

23. An isolated nucleic acid comprising a strand 
that comprises a nucleotide sequence that (i) is at least 
50 nucleotides in length and (ii) hybridizes under highly 
stringent conditions with SEQ ID NO: 8, or the complement 

10 of SEQ ID NO: 8. 



24. An isolated nucleic acid comprising a strand 
that comprises a nucleotide sequence that (i) is at least 
50 nucleotides in length and .'(ii) shares at least 80% 
sequence identity with a fragment of SEQ ID NO: 12 having 

15 the same length as the nucleotide sequence. 

25. The isolated nucleic acid of claim 24, 
wherein the nucleotide sequence is at least 100 
nucleotides in length. 

» 

26. The isolated nucleic acid of claim 24, 
20 wherein the fragment is a part or all of SEQ ID NO: 7. 

27. The isolated nucleic acid of claim 24, 
wherein the fragment is a part or all of SEQ ID NO: 8. 

28. A homologously recombinant mammalian cell 
stably transfected with the DNA construct of claim 1, the 

25 DNA construct having undergone homologous recombination 
with genomic DNA upstream of the ATG initiation codon of 
an endogenous IFNA2 coding sequence. 
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29. A homologously recombinant mammalian cell 
stably transfected with the DNA construct of claim 2, the 
DNA construct having undergone homologous recombination 
with genomic DNA upstream of the ATG initiation codon of 

5 an endogenous IFNA2 coding sequence. 

30. A homologously recombinant cell stably 
transfected with the DNA construct of claim 3, the DNA 
construct having undergone homologous recombination with 
genomic DNA upstream of the ATG initiation codon of an 

10 endogenous IFNA2 coding sequence. 

31. A homologously recombinant cell stably 
transfected with the DNA construct of claim 4, the DNA 
construct having undergone homologous recombination with 
genomic DNA upstream of the ATG initiation codon of an 

15 endogenous IFNA2 coding sequence. 

32. A method of altering expression of an 
endogenous IFNA2 gene in a mammalian cell, the method 
comprising introducing the DNA construct of claim 1 into 
the cell and maintaining the cell under conditions which 

20 permit homologous recombination to occur between the 

construct and a target site 5' to the coding sequence of 
the endogenous IFNA2 gene. 

33. A method of delivering IFNA2 to an animal, 
comprising 

25 providing the cell of claim 28; and 

implanting the cell in the animal, wherein the 
cell secretes IFNA2. 
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34. A method of delivering IFNA2 to an animal, 
comprising 

providing the cell of claim-29; and 
implanting the cell in the animal, wherein the 
5 cell secretes IFNA2. 

35. A method of delivering IFNA2 to an animal, 
comprising 

providing the cell of claim 30; and 
implanting the cell in the animal, wherein the 
10 cell secretes IFNA2 . 

36. A method of delivering IFNA2 to an animal, 
comprising 

providing the cell of claim 31; and 
implanting the cell in the animal, wherein the 
15 cell secretes IFNA2 . 

37. A method of producing IFMA2, comprising 
providing the cell of claim 28, and 

culturing the cell in vitro under conditions which 
permit the cell to express and secrete IFNA2. 

38. A method of producing IFNA2, comprising 
providing the cell of claim 29, and 
culturing the cell in vitro under conditions which 

the cell to express and secrete IFNA2 . 

39. A method of producing IFNA2, comprising 
providing the cell of claim 30, and 
culturing the cell in vitro under conditions which 

permit the cell to express and secrete IFNA2 . 



20 



permit 
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40. A method of producing IFNA2 , comprising 
providing the cell of claim 31, and 
culturing the cell in vitro under conditions which 
permit the cell to express and secrete IFNA2 . 

5 41. A DNA construct that alters expression of an 

endogenous IFNA2 gene in a mammalian cell upon 
integration into the genome of the cell via homologous 
recombination, the construct comprising (i) a targeting 
sequence containing at least '20 contiguous nucleotides of 
10 one or more of SEQ ID NOs:16, 17, 18, and 19, and (ii) a 
transcriptional regulatory sequence. 

42. An isolated nucleic acid comprising at least 
20 contiguous nucleotides of. "one or more of 
SEQ ID N0s:16, 17, 18, and 19, or the complement of one 
15 or more of SEQ ID NOs : 16, 17, 18, and 19, wherein the 
isolated nucleic acid does not encode full-length 
interf eron-or 2. 



43. A homologously recombinant mammalian cell, 
stably transfected with the DNA construct of claim 39, 

2 0 the DNA construct having undergone homologous 

recombination with genomic DNA upstream of the ATG 
initiation codon of an endogenous IFNA2 coding sequence. 

44. A method of altering expression of an 
endogenous IFNA2 gene in a mammalian cell, the method 

25 comprising introducing the DNA construct of claim 3 9 into 
the cell and maintaining the cell under conditions which 
permit homologous recombination to occur between the 
construct and a target site 5' to the coding sequence of 
the endogenous IFNA2 gene. 
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45. A method of delivering IFNA2 to an animal, 
comprising 

providing the cell of claim 41; and 
implanting the cell in the animal, wherein the 
cell secretes IFNA2. 

46. A method of producing IFNA2, comprising 
providing the cell of claim 41, and 

culturing the cell in vitro under conditions which 
permit the cell to express and secrete • IFNA2 . 
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-510 GCGCCTCTTA TG7ACCCACA AAAATCTATT TTCAAAAAAG TTGCTCTAAG AA7A7AGTTA 7CAAGTTAAG 
-440 TAAAATGTCA ATAGCCTTTT AATTTAATTT TTAATTGTTT TATCATTCTT 7GCAATAATA AAACATTAAC 
-370 TT7ATACTTT TTAATTTAAT G7ATAGAATA GAGATATACA TAGGATATC7 AAATAGATAC ACAG7GTATA 

IFN7 (-262) 

-300 7GTGATTAAA ATATAATGGG AGATTCAATC AGAAAAAAGT TTCTAAAAAG GCTCTGGGGT AAAAGAGGAA 

! ► 

-230 ggaaacaaca acgaaaaaaa .g-ggtgaga aaaacagcrrg AAAACCCATG TAAAGAGTGT ATAAAGAAAG 
-160 CAAAAAGAGA AGTAGAAAGT AACACAGGGG CATTTGGAAA ATGTAAACGA GTATGTTCCC 7ATTTAAGGC 

Cap (-67) 

-90 TAGGCACAAA GCAAGGTCTT CAGAGAACCT GGAGCCTAAG GTTTAGGCTC ACCCATTTCA ACCAGTCTAG 

ATG (1) 

-20 CAGCATCTGC AACATCTACA ATGGCCTTGA CCTTTGCTTT ACTGGTGGCC C7CCTGGTCC 7CAGCTGCAA 

l>MetAlaLeuT hr PheAl aLe uLeuVatAla LeuLeuValL euSerCysLy 
IFN6 (70) 
Mature (70) 

51 GTCAAGCTGC TCTGTGGGCT G7GATCTGCC 7CAAACCCAC AGCCTGGGTA GCAGGAGGAC CTTGATGCTC 

17^sSer SerCys SerValGlyC ysAspLeuPr oGlnThrHis SerLeuGlyS erArgArgTh rLeufvtetLeu 

121 CTGGCACAGA TGAGGAGAAT C7CTCTTTTC TCCTGCTTGA AGGACAGACA 7GAC7TTGGA TTTCCCCAGG 

41^ LeuAlaGlnM etArgArgll eSerLeuPhe SerCysLeuL ysAspArgHi sAspPheGIy PheProGinG 
191 AGGAGTTTGG CAACCAGTTC CAAAAGGCTG AAACCATCCC TGTCCTCCAT GAGATGATCC AGCAGATCTT 

64> I uGl uPheGI yAsnGlnPhe Gl nLysAI aG luThrllePr oValLeuHis GluVtetlleG InGlnllePh 
261 CAATCTCTTC AGCACAAAGG ACTCATCTGC 7GCTTGGGAT GAGACCCTCC TAGACAAATT CTACACTGAA 

87> eAsnLeuPhe Ser Thr LysA spSer Ser Al aAlaTrpAsp GluThrLeuL euAspLysPh eTyrThrGlu 
331 CTCTACCAGC AGCTGAATGA CCTGGAAGCC TGTGTGATAC AGGGGGTGGG GGTGACAGAG ACTCCCCTGA 

lll>leuTyrGlnG I nLeuAsnAs pLeuGiuAIa CysValMeG InGlyValGi yValThrGlu ThrProLeuM 
401 TGAAGGAGGA CTCCATTCTG GCTGTGAGGA AATACTTCCA AAGAATCAC7 CTCTATCTGA AAGAGAAGAA 

134> etLysGI uAs pSer I I eLeu AlaValArgL ysTyrPheGI nArgMeThr LeuTyrleuL ysGluLysLy 
471 ATACAGCCCT TGTGCCTGGG AGGTTGTCAG AGCAGAAATC ATGAGATCTT TTTCTTTGTC AACAAACTTG 

157^sTyrSer Pro CysAlaTrpG luValValAr gAlaGlulle MetArgSerP heSerLeuSe rThrAsnLeu 

Stop codon (565) 

541 CAAGAAAGTT TAAGAAGTAA GGAATGAAAA CTGGTTCAAC ATGGAAATGA 77TTCATTGA TTCGTATGCC 

181>GlnGluSerL euArgSerLy sGIu««- (SEQ ID NO: 2) 

IFN1 (639) 

611 AGCTCACCTT TTTATGATC7 GCCATTTCAA AGACTCATGT TTCTGCTA7G ACCA7GACAC GATTTAAATC 

681 7TTTCAAATG TTTTTAGGAG TATTAATCAA CATTGTATTC AGCTCTTAAG GC^CTAGTCC CTTACAGAGG 
751 ACCATGCTGA CTGATCCATT ATCTATTTAA ATATTTTTAA AATATTATTT A7TTAACTAT TTATAAAACA 

1FN2 (854) 

821 ACTTATTTTT GTTCATATTA TGTCATGTGC ACCTTTGCAC AGTGGTTAA7 GTAATAAAAT GTGTTCTTIG 
; 

polyA site (899) 

891 TATTTGGTAA ATTTATTTTG TGTTGTTCAT TGAACTTTTG CTATGGAAC7 T7TGTACTTG TTTATTCTTT 
961 AAAATGAAAT TCCAAGCCTA ATTGTGCAAC CTGATTACAG AATAACTGG7 ACACTTCATT TGTCCATCAA 

polyA site M074) 

1031 . ATTATATTC AAGATATAAG 7AAAAATAAA CTTTCTGTAA ACCAAG7TG7 A7G7TGTAC7 CAAGATAACA 
1101 GGGTGAACCT AACAAATACA ATTCTGCTCT CTTGTGTATT TGATTT77G7 A7GAAAAAAA CTAAAAATGG 
1171 7AATCATACT TAATTATCAG 77ATGGTAAA 7GGTATGAAG AGAAGAAGGA ACS (SEQ ID NO: 1) 

FIG. 1 



WO 99/57292 



PCT/US99/09925 



2/11 





WO 99/57292 



PCT/US99/09925 



3/11 



g 
a 

UJ 

CO 



& & g £ 

o u t« e 



CO 

O 

UL 



8 



WO 99/57292 



PCT/US99/09925 



4/11 • 

" 583 CATTGGATAC TCCATCACCT GCTGTCATAT TATGAATGTC TGCCTATATA AA TATTCACT ATTCCATAAC 
-513 ACAGCGCC7C TTATGTACCC ACAAAAATCT ATTTTCAAAA AAGTTGCTCT AAGAATATAG TTATCAAGTT 
^^t™ ^ TAGCCT ^TAATTTAA TTTTTAATTG TTTTATCATT CTTTGCAATA ATAAAACATT 



-443 



"30 ™aattt aaistataga atagagatat acataggata SiSSSS SSESS 

-303 .tatctgatt aaaatataat cggagattca atcagaaaaa agtttctaaa aaggctctgg cgSaaagS 

"?« acaatgaaaa aaacc-.ggcg agaaaaacag ccgAAAACCC ATGTAAAGAG TGTATAAAGA 

-163 AAGCAAAAAG AGAAGTAGAA AGTAACACAG GGGCATTTGG AAAATGTAAA CGAGTATGTT CGCtI™ 

_, Cap (-67) 

93 GGCTAGGCAC AAAGCAAGGT CTTCAGAGAA CCTGGAGCCT AAGGTTTAGG CTCACCCATT rr CAACCAGTC 

ATG (1) 

-23 TAGCAGCATC TGCAACATCT ACAATGGCCT TGACCTITGC TTTACTGGTG GCCCTCCTGG 1GCTCAGCTG 

l>MetAlaL euThrPheAl aLeuLeuVal AlaLeuLeuV alLeuSarCy 
Mature (70) 

48 CAAGTCAAGC TGCTCTGTGG GCTGTGATCT GCCTCAAACC CACAGCCTGG GTAGCAGGAG GACCTTGATG 
16^sLysSerSer CysSerValG ly ^ LLUUlTC 

_ l>CysAspLe uProGlnThr HisSerLeuG lySerArqAr aThrLeuMPt 

118 CTCCTGGCAC AGATGAGGAG AATCTCTCTT TTCTCCTGCT TGAAGGACAG ACATGACTTT GGATTTCCCC 

17> LeuLeuAl aG InMetArgAr glleSerLeu PheSerCysL euLysAspAr gHisAspPhe GlyPheProG 

188 AGGAGGAGTT TGGCAACCAG TTCCAAAAGG CTGAAACCAT CCCTGTCCIG CATGAGATGA TCCAGCAGAT 

of^™ UG,uPh eG, y A snGln PheGlnLysA laGluThrll eProValLeu HisGtuMet I leGlnGlnll 
258 CTTCAATCTC TTCAGCACAA AGGACTCATC TGCIGCTTGG GATGAGACCC TCCTAGACAA ATO^ACACT 

63>ePheAsnLeu PheSerThrL ysAspSerSe rAlaAlaTrp AspGI uThr L euLeuAspLy sPheTvrThr 
328 GAACTCTACC AGCAGCTGAA TGACCTGGAA GCCTGTGTGA TACAGGGGGT GGGGGTGACA GAGACTCCCC 

87^GluLeuTyrG InGlnLeuAs nAspLeuGI u Al aCysVal I leGlnGlyVa IGlyValThr GluThrProL 
398 TGATGAAGGA GGACTCCATT CTGGCTGTGA GGAAATACTT CCAAAGAATC ACTCTCTATC TGAAAGAGAA 

llO^euMetLysGI uAspSer 1 1 e LeuAlaValA rgLysTyrPh eGlnArglle ThrLeuTyrL euLySGIuLy 

468 GAAATACAGC CCTTGTGCCT GGGAGGTTGT CAGAGCAGAA ATCATGAGAT CrrmCTTT GTCAACAAAC 

slf^I™ P r^lt MaT r » GluVa,Va 'ArgAlaGlu MeMetArgS erPheSerLe uSerThrAsn 
b.38 TTGCAAGAAA GTTTAAGAAG TAAGGAATGA AAACTGGTTC AACATGGAAA TGATTTTCAT TGATTCGTAT 

157>LeuGlnGluS erLeuArgSe rLysGiu««« (SEQ ID NO: 2) 

til SSSS? ^ rrA ^! GA TCTGCCA1Tr CAAA^CTCA TGTrrCTGCT ATGACCATGA CACGATTTAA 
678 ATCTTTTCAA ATCTTTTTAG GAGTATTAAT CAACATTGTA TTCAGCTCTT AAGGCACTAG TCCCTTACAG 
748 AGGACCATGC TGACTGATCC ATTATCTATT TAAATATTTT TAAAATATTA TTTATTTAAC TATTTATAAA 
818 ACAACTTATT TTTGTTCATA TTATGTCATG TGCACCTTTG CACAGTGGTT AATGTAATAA AATGTGTTCT 
polyA site (899) 

888 TTGTATTTGG TAAATTTATT TTGTGTTGTT CATTGAACTT TTGCTATGGA ACTTTTGTAC TTGTTTATTC 
958 TTTAAAATGA AATTCCAAGC CTAATTGTGC AACCTGATTA CAGAATAACT GGTACACTTC ATTTGTCCAT 

irv> D P°'y A s 'te (1074) 

1028 GrATATTATA TTCAAGATAT AAGTAAAAAT AAACTTTCTG TAAACCAAGT TGTATGTTGT ACTCAAGATA 

1098 nCAGGGTGAA CCTAACAAAT ACAATTCTGC TCTCTTGTGT ATTTGATTTT TGTATGAAAA AAACTAAAAA 

1168 TGGTAATCAT ACTTAATTAT CAG7TA7GGT AAATGGTATG AAGAGAAGAA GGAACG (SEQ ID NO: 9) 

FIG. 4 
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FIG. 6A 



FIG. 6B 

FIG. 6 



FIG. 7A 



FIG. 7B 

FIG. 7 
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Hindlll (-4073) 



-4074 


AAGCTTTTAT 


AGGTGTAAAT 


TTTCC ACTTA 


GTACTGCTTT 


TGTAATGTTG 


7CTTTTTATT 


-4014 


riCATTTATC 


TCAAGATGTT 


TTCTAATTTC 


TCTTGACTTC 


CTTCTTAAAT 


TCTTACCTCA 


-3954 


TGTAGACATA 


CATTTTTGGC 


CCTATGCATT 


GGGATGCAAA 


ACCAGACTAA 


'TTACTTTGT 


-3894 


ACAAAAAGAA 


AAATGAGAAA 


GAAATATATT 


TGGTCTTGTG 


AGCACTATAT 


GGAAATACTT 


-3834 


TATATTCCAT 


TTGTTTCATC 


ATATTCATAT 


ATCCCTTT AC 


TAACATAAAG 


CTGAAGGTGA 


-3774 


ATAAAAAAAT 


CAGGGTTAGC 


CAAACAAATT 


TTCATGGTCA 


AATACCACAT 


AAAAAGTAAA 


-3714 


TATACTTAAG 


TTCCCAGCAA 


AATCTGAATT 


GAACGTAGAC 


AAAATGCTCA 


TTTCTCAGTG 


-3654 


TTTGACAGAC 


TTAACAGTTT 


GAGCCAATAA 


AAATGTACTG 


ACTAGATAAA 


CTACTAAAAG 


-3594 


TTGTTAATTT 


TTGCAATGTA 


TATTTCTGAA 


AAGAAAGTTT 


ATCTATTATA 


GAAATTCCTG 


-3534 


TGCCCATTTA 


AGAACTTTGA 


GCATTTTAAT 


TGTTTAATAA 


TATAGTTTAA 


TTGCATCATG 


-3474 


AAAATAATCA 


ATAATACAAT 


TTATTTGGTT 


TATTTAAAAA 


AACTGATTCT 


TTCTGCTCTC 


-3414 


TCTATATATA 


GACTGATTTT 


ATACTAATGT 


TGCCTAAAGA 


TCACCAAATT 


GTTTGAAGCC 


-3354 


TAGGTTTCTG 


AGGGATGGAA 


AATGATGTCA 


CAACTATTTA 


CAGTTCACAC 


ACACATTCTG 


-3294 


GGGATTTAAT 


ACATCCTTTA 


CAAGTGCAGG 


AAAGGTGGAA 


GATTGATGAT 


TTGGGGGAAT 


-3234 


TAGAGCTACC 


ACACCCCAGA 


GGGTGGTATG 


GTATGTTGTC 


TGTTGTGAGC 


TGTGTGAATC 


-3174 


AGAGAGTTTG 


ATTTAGACAT 


ATATTTAGAA 


AGAGGAAAGA 


TGAACCAATC 


AAAAATAATA 


-3114 


ACTATAATGA 


CTTTTCAAGA 


TATAGACAAT 


ACAGTTAAGA 


TATAAATGGA 


AACAAAAAAA 


-3054 


GTTAAAAGTG 


GGGAGATGAA 


GTCTGATTTT 


TTGGTTTTTT 


TTTTTTTTTG 


CTTTTTTGTT 


-2994 


TGTTTATGTA 


ATCAGTGTTA 


CCAGTTTAAA 


ATAATGGGTT 


ATAAGACACT 


ATATGCAAGC 


-2934 


CTCATGGTAA 


CCTCCAATCT 


AAAACATACA 


ACAAATACAC 


ACAAAATAAA 


AAGGAGAAAT 


-2874 


TAAAACACAC 


CACCAGAGAA 


AATCACCTAC 


ATTAAAAGAA 


AGACAAATAG 


GAAGAAAATA 


-2814 


AGAAAGAGAA 


GGCCATCAAA 


TAATCAGAAA 


ATGAATAACA 


AAATGACAGG 


AATA^GTCCT 


-2754 


CATAAATAAT 


AACATTGAAT 


GTAAATGGAC 


TAAGCTCTCC 


AATGAAAGAC 


AGGGAGTGGC 


-2694 


TGAATGTATT 


TTAAAAAAAA 


TATTACACCG 


AGCTGTGCGT 


GGTGTCTCAC 


ACCTATAATC 


-2634 


CCAGCATTTT 


GGGAGACTGA 


GCCGGGTGGA 


TCACTTGAGC 


CCAGGAGTTC 


GAGACCAGCC 


-2574 


TGGCCAACAT 


GGCAAAACCC 


TGTCTCTACT 


AAAAATACAA 


AAAATTAGCT 


GAACATGGTG 


-2514 


GCACATGCCT 


GTGGTTCCAG 


CTACTAGAGA 


GGCTGAGGCA 


GAAGAATTGC 


TTGAACTTGG 


-2454 


GAGGTGGAGG 


TTGCAGTGAG 


CTAAGATTGA 


TGGAGCCACT 


GCACCCCAGC 


CTAGGTGACA 


-2394 


GAATAAGACT 


CTGCCTCAAA 


AAAAAAAAGC 


AAAACAAAAC 


AAAACAAAAA 


ACCCTTAGAC 


-2334 


CCAATGATTC 


ATTGCCTACA 


AGAAGTATGC 


TTCACCTTTA 


AAGACACATA 


TAGACTGAAG 


-2274 


GTAAAGGGAT 


GGAAAAATAT 


TCTATGCCTA 


TGGAAACAAA 


CAAAAAGAAG 


CAGAAGCTAC 


-2214 


ATTTATATCA 


GACAAAATAG 


ACTGCAAGAC 


AAAAACTATG 


AAAAGAGAGA 


AAGAAGGTCA 


-2154 


TTATATAGTG 


ATAAAGGGGT 


CCATTTAGCA 


AGAGCATTTA 


ACAATTCTAA 


ATATATATTC 


-2094 


ACCCAATACT 


GGAGTACTCA 


GGTATATAAA 


GCAAATATTA 


TTAGAGCCAA 


AGAGAGAGAT 


-2034 


AGACAGACCC 


CCATACAATA 


ATAACTGGAG 


ACTTCAACAC 


CCCACTTTCA 


GCATTGGACA 


-1974 


GATCATCCAG 


ACAGAAAATT 


AACAAACATC 


AAATTTCATC 


TGCACCATAG 


GTCAAATGGA 


-1914 


CCTAGTAGAT 


ATTTACAGAA 


CATTTGATCC 


AACAGCTGTA 


GAATACACAT 


TCTTCTCCTC 


-1854 


AGCACATGGA 


TAATTCTCAA 


GGATATACCA 


AATGCTAGGT 


CACAAAACAA 


ATCTTAAAAT 


-1794 


TTAGAAAAAA 


AGTGAAATAA 


TATCAAACGT 


TTTCTCTCAC 


CACAGACTAA 


GAAAAAAAGA 


-1734 


AGTCCCAAAT 


AAATAC AATC 


TGAGATAAAA 


AAGGAGACGA 


GACAACCAAT 


ACCACAAAAA 
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-1674 ATTAAAGGAT CATTAGAAGA T ACTATGAAA CTATATGCTA ATAAATTGGA AAACCTEAAT 
-1614 AAAATAGATA ATTCCTAGAA ACATACAACA TACTGGTCTG TTCAGGTTTT GTATTTTTTC 
-1554 ATAGTACCAT GAAGAAATAC AAGAATTGTT TCTAGAACCA TTCTTGTATT TC7TCATC- 
-1494 TTTTGTATTT CTTCATGGAA CCATGAAGAA ATACAAAATG TGAACAGGCC AATAACAAGT 
-1434 AATGAGACAG AAGCCATACT AAAAAGTATC CCAGAAAAGA ACTCAGGATC TGATGGCTT" 
-1374 ACTGATGAAT TTTGCCAAAT ATTTAAAAAA CTAATACCAA TCCAACTCAA ATTATTAAAA 
-1314 AAATAGAGGT GGACAGAATC TTTCCAAATG TATTCTATGA GGCCAGTGTT TTTTCTGATT 
-1254 GAATCTCCCA TTATATTTTA ATCACATATA AAACCAGAGA AAGACACATT AAAAAGAAAG 
-1194 AAAACTGTAG GCCAATATCT CTGATGAACA TTGATGCAGA AATCCTCAAC AACAAATTAG 
-1134 CAAACTGAAT 7CAACAACAC ATTAAAACAA TCATTCATCA TGACCAAGTG GAATTTGTCC 
-1074 TAGAGATTCA AGTGTGGTTA GGTATGTGCA GATCAATGGG TTTAATGTTG TCCAATGAAC 
-1014 ATAATGTCCT CCAGCTCCAT CCATGTTCTT GCAAATGACA GGATCTCATT CTTTTTTATG 
-954 GCTAAGTAGT ACTCCATTGT GTATAAGTGC CATATTTTCT TTATCCATTC ATCTGTTAGA 
-894 CACCTAAGTT GCTTCCAAAT CTTAGCTATT GTGAATAGTG CTGCAATAAA CATGGGAGTG 
-834 i AAATATT7T GTTGACATAC TGATT7CATT TCCTTTGGAT AAATACCCAG TAGTGGGATT 
-774 GCTGGATCAT ATGGGGGAAA ATGGAGATGG CTAACGGGCT CAAAAATATA GTTAGAAAAA 
-714 ATGAATATGA TTTAGTATTC GATAGCACAA TAGGATGACT ACTGTTAATG ATAATTTATT 
-654 ATATATTATA AAATAACTAA AATAGTATAA ATGGGATGTA TGTAGCAGAG AGAAATGATA 
583 

-594 AATGTTTGAA GCATTGGATA CTCCATCACC TGCTGTGATA TTATGAATGT C.7GCCTATA7 
-534 AAATATTCAC TATTCCATAA CACAgCgCCT CTTATGTArr C AC A A A A A TP 
-484 TATTTTCAAA AAAGTTSPTP TAAGAAT ATA GTT ATC A ART TAAGTAAAAT 
-434 STQAATA^r TTTTAATTTA ATTTTTAATT GTTTTATTAT TCTTTRfTA AT 
-384 AATAAAACAT TAACTTTATA CTTTTT A ATT TAATGTATAC AATAGARATA 
-334 T ACAT AGB A T ATgTAAATAS ATACAg AQTQ T A T A T R Tfi A T TAAA ATATA A 
-284 mmti^JT^ AATCAGAAAA , AAgTTTfiTA A AAAGRfTTTTft GGGTAAAAKA 
-234 GGAAGGAAAC A ATAATftAAA AAAATRTSST GAGAAAAAPA GCTS AAAAPP 
-184 CATgTAAAGA. STGTATAAAG AAAGCAAAAA GAGAACTARA AAGTAACACA ' 
-134 SSSSCATTTP, g AAA AT(|TA A ACGAG7ATRT TCCCTATTTA AGGC7ARRPA 

-84 CAAAGCA A(^G TCTTCAQM^^f^ACCTGGAGCC 7A AGRTTTAR GCTCACCCAT 
-34 TTrAArrA^T CTAGCARfi AT CTGC A A C ATP T A C A 1~ T^RP P TTGAPCTTTR 

FIG. 6B 
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<210> 19 

<211> 14 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 
<222> (1)...(14) 
<223> n = A,T,C or G 



<400> 19 

yyyyyyyyyy nyag 



14 
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<210> 16 
<211> 29 
<212> DNA 

<213> Homo sapiens 

<400> 16 
aggaaaggtg gaagattgat gatttgggg 

<210> 17 
<211> 105 
<212> DNA 

<213> Homo sapiens 
<400> 17 

ggggaattag agctaccaca ccccagaggg tggtatggta tgttgtctgt tgtgagctgt 
gtgaatcaga gagtttgatt tagacatata tttagaaaga ggaaa 

<210> 18 
<211> 2629 
<212> DNA 

<213> Homo sapiens 



29 



<400> 18 

aaagatgaac caatcaaaaa taataactat 
taagatataa atggaaacaa aaaaagttaa 
tttttttttt ttttgctttt ttgtttgttt 
gggttataag acactatatg caagcctcat 
tacacacaaa ataaaaagga gaaattaaaa 
aagaaagaca aataggaaga aaataagaaa 
taacaaaatg acaggaataa gtcctcataa 
tctccaatga aagacaggga gtggctgaat 
tgcgtggtgt ctcacaccta taatcccagc 
tgagcccagg agttcgagac cagcctggcc 
tacaaaaaat tagctgaaca tggtggcaca 
aggcagaaga attgcttgaa cttgggaggt 
ccactgcacc ccagcctagg tgacagaata 
aaaacaaaac aaaaaaccct tagacccaat 
ctttaaagac acatatagac tgaaggtaaa 
acaaacaaaa agaagcagaa gctacattta 
ctatgaaaag agagaaagaa ggtcattata 
atttaacaat tctaaatata tattcaccca 
tattattaga gccaaagaga gagatagaca 
aacaccccac tttcagcatt ggacagatca 
tcatctgcac cataggtcaa atggacctag 
ctgtagaata cacattcttc tcctcagcac 
taggtcacaa aacaaatctt aaaatttaga 
ctcaccacag actaagaaaa aaagaagtcc 
gacgagacaa ccaataccac aaaaaattaa 
tgctaataaa ttggaaaacc tgaacaaaat 
gtctgttcag gttttgtatt ttttcatagt 
aaccattctt gtatttcttc atggtttttg 
aaatgtgaac aggccaataa caagtaatga 
aaagaactca ggatctgatg gcttcactga 
accaatccaa ctcaaattat taaaaaaata 
tatgaggcca gtgttttttc tgattgaatc 
agagaaagac acattaaaaa gaaagaaaac 
gcagaaatcc tcaacaacaa attagcaaac 
catcatgacc aagtggaatt tgtcctagag 
atgggtttaa tgttgtccaa tgaacataat 
tgacaggatc tcattctttt ttatggctaa 
tttctttatc cattcatctg ttagacacct 
tagtgctgca ataaacatgg gagtgtaaat 
tggataaata cccagtagtg ggattgctgg 
gggctcaaaa atatagttag aaaaaatgaa 
tgactactgt taatgataat ttattatata 
atgtatgtag cagagagaaa tgataaatgt 
tgatattatg aatgtctgcc tatataaata 



aatgactttt 
aagtggggag 
atgtaatcag 
ggtaacctcc 
cacaccacca 
gagaaggcca 
ataataacat 
gtattttaaa 
attttgggag 
aacatggcaa 
tgcctgtggt 
ggaggttgca 
agactctgcc 
gattcattgc 
gggatggaaa 
tatcagacaa 
tagtgataaa 
atactggagt 
gacccccata 
tccagacaga 
tagatattta 
atggataatt 
aaaaaagtga 
caaataaata 
aggatcatta 
agataattcc 
accatgaaga 
tatttcttca 
gacagaagcc 
tgaattttgc 
gaggtggaca 
tcccattata 
tgtaggccaa 
tgaattcaag 
attcaagtgt 
gtcctccagc 
gtagtactcc 
aagttgcttc 
attttgttga 
atcatatggg 
tatgatttag 
ttataaaata 
ttgaagcatt 
ttcactattc 



caagatatag 

atgaagtctg 

tgttaccagt 

aatctaaaac 

gagaaaatca 

tcaaataatc 

tgaatgtaaa 

aaaaatatta 

actgagccgg 

aaccctgtct 

tccagctact 

gtgagctaag 

tcaaaaaaaa 

ctacaagaag 

aatattctat 

aatagactgc 

ggggtccatt 

actcaggtat 

caataataac 

aaattaacaa 

cagaacattt 

ctcaaggata 

aataatatca 

caatctgaga 

gaagatacta 

tagaaacata 

aatacaagaa 

tggaaccatg 

atactaaaaa 

caaatattta 

gaatctttcc 

ttttaatcac 

tatctctgat 

aacacattaa 

ggttaggtat 

tccatccatg 

attgtgtata 

caaatcttag 

catactgatt 

ggaaaatgga 

tattcgatag 

actaaaatag 

ggatactcca 

cataacaca 



acaatacagt 
attttttggt 
ttaaaataat 
atacaacaaa 
cctacattaa 
agaaaatgaa 
tggactaagc 
caccgagctg 
gtggatcact 
ctactaaaaa 
agagaggctg 
attgatggag 
aaagcaaaac 
tatgcttcac 
gcctatggaa 
aagacaaaaa 
tagcaagagc 
ataaagcaaa 
tggagacttc 
acatcaaatt 
gatccaacag 
taccaaatgc 
aacgttttct 
taaaaaagga 
tgaaactata 
caacatactg 
ttgtttctag 
aagaaataca 
gtatcccaga 
aaaaactaat 
aaatgtattc 
atataaaacc 
gaacattgat 
aacaatcatt 
gtgcagatca 
ttcttgcaaa 
agtgccatat 
ctattgtgaa 
tcatttcctt 
gatggctaac 
cacaatagga 
tataaatggg 
tcacctgctg 



60 
105 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
* 1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2629 
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aaagacaaat 

caaaatgaca 

ccaatgaaag 

gtggtgtctc 

gcccaggagt 

aaaaaattag 

cagaagaatt 

ctgcacccca 

acaaaacaaa 

taaagacaca 

aacaaaaaga 

tgaaaagaga 

taacaattct 

tattagagcc 

accccacttt 

tctgcaccat 

tagaatacac 

gtcacaaaac 

accacagact 

gagacaacca 

taataaattg 

tgttcaggtt 

cattcttgta 

tgtgaacagg 

gaactcagga 

aatccaactc 

gaggccagtg 

gaaagacaca 

gaaatcctca 

catgaccaag 

ggtttaatgt 

caggatctca 

ctttatccat 

tgctgcaata 

ataaataccc 



aggaagaaaa 

ggaataagtc 

acagggagtg 

acacctataa 

tcgagaccag 

ctgaacatgg 

gcttgaactt 

gcctaggtga 

aaacccttag 

tatagactga 

agcagaagct 

gaaagaaggt 

aaatatatat 

aaagagagag 

cagcattgga 

aggtcaaatg 

attcttctcc 

aaatcttaaa 

aagaaaaaaa 

ataccacaaa 

gaaaacctga 

ttgtattttt 

tttcttcatg 

ccaataacaa 

tctgatggct 

aaattattaa 

ttttttctga 

ttaaaaagaa 

acaacaaatt 

tggaatttgt 

tgtccaatga 

ttctttttta 

tcatctgtta 

aacatgggag 

agtagtggga 



taagaaagag 

ctcataaata 

gctgaatgta 

tcccagcatt 

cctggccaac 

tggcacatgc 

gggaggtgga 

cagaataaga 

acccaatgat 

aggtaaaggg 

acatttatat 

cattatatag 

tcacccaata 

atagacagac 

cagatcatcc 

gacctagtag 

tcagcacatg 

atttagaaaa 

gaagtcccaa 

aaattaaagg 

acaaaataga 

tcatagtacc 

gtttttgtat 

gtaatgagac 

tcactgatga 

aaaaatagag 

ttgaatctcc 

agaaaactgt 

agcaaactga 

cctagagatt 

acataatgtc 

tggctaagta 

gacacctaag 

tgtaaatatt 

ttgctggatc 



aaggccatca 

ataacattga 

ttttaaaaaa 

ttgggagact 

atggcaaaac 

ctgtggttcc 

ggttgcagtg 

ctctgcctca 

tcattgccta 

atggaaaaat 

cagacaaaat 

tgataaaggg 

ctggagtact 

ccccatacaa 

agacagaaaa 

atatttacag 

gataattctc 

aaagtgaaat 

ataaatacaa 

atcattagaa 

taattcctag 

atgaagaaat 

ttcttcatgg 

agaagccata 

attttgccaa 

gtggacagaa 

cattatattt 

aggccaatat 

attcaagaac 

caagtgtggt 

ctccagctcc 

gtactccatt 

ttgcttccaa 

ttgttgacat 

ata 



aataatcaga 

atgtaaatgg 

aatattacac 

gagccgggtg 

cctgtctcta 

agctactaga 

agctaagatt 

aaaaaaaaaa 

caagaagtat 

attctatgcc 

agactgcaag 

gtccatttag 

caggtatata 

taataactgg 

ttaacaaaca 

aacatttgat 

aaggatatac 

aatatcaaac 

tctgagataa 

gatactatga 

aaacatacaa 

acaagaattg 

aaccatgaag 

ctaaaaagta 

atatttaaaa 

tctttccaaa 

taatcacata 

ctctgatgaa 

acattaaaac 

taggtatgtg 

atccatgttc 

gtgtataagt 

atcttagcta 

actgatttca 



aaatgaataa 

actaagctct 

cgagctgtgc 

gatcacttga 

ctaaaaatac 

gaggctgagg 

gatggagcca 

gcaaaacaaa 

gcttcacctt 

tatggaaaca 

acaaaaacta 

caagagcatt 

aagcaaatat 

agacttcaac 

tcaaatttca 

ccaacagctg 

caaatgctag 

gttttctctc 

aaaaggagac 

aactatatgc 

catactggtc 

tttctagaac 

aaatacaaaa 

tcccagaaaa 

aactaatacc 

tgtattctat 

taaaaccaga 

cattgatgca 

aatcattcat 

cagatcaatg 

ttgcaaatga 

gccatatttt 

ttgtgaatag 

tttcctttgg 



<2I0> 14 

<211> 137 

<212> DNA 

< 2 1 3 > Homo sapiens 

<400> 14 

gagaacctTg agcctaaggt ttaggctcac ccatttcaac cagtctagca gcatctgcaa 
catctacaat ggccttgacc tttgctttac tggtggccct cctggtgctc agctgcaagt 
caagctgctc tgtgggc 

<:ic> 15 
<:::> 805 
<::2> DNA 



< 2 1 } > Homo sapiens 



< AOO 
aagct t tt at 
ttcatttat c 
tgtagacat « 

tatat tccMt 

tatacttaag 

tttg«C#Q4C 

ttgttaattt 
tgcccattta 
aaaataatca 
tctatatata 
taggtttctg 
gggatttaat 



* 15 

aggtgtaaat 
t caagatgtt 
catttttggc 
aaatgagaaa 
ttgtttcatc 
cagggttagc 
t tcccagcaa 
ttaacagttt 
ttgcaatgta 
agaactttga 
Ataatacaat 
<;actgatttt 
agggatggaa 
acatccttta 



tttccactta 
ttctaatttc 
cctatgcatt 
gaaatatatt 
atattcatat 
caaacaaatt 
aatctgaatt 
gagccaataa 
tatttctgaa 
gcattttaat 
ttatttggtt 
atactaatgt 
aatgatgtca 
caagt 



gtactgcttt 
tcttgacttc 
gggatgcaaa 
tggtcttgtg 
atccctttac 
ttcatggtca 
gaacgtagac 
aaatgtactg 
aagaaagttt 
tgtttaataa 
tatttaaaaa 
tgcctaaaga 
caactattta 



tgtaatgttg 
cttcttaaat 
accagactaa 
agcactatat 
taacataaag 
aataccacat 
aaaatgctca 
actagataaa 
atctattata 
tatagtttaa 
aactgattct 
tcaccaaatt 
cagttcacac 



tctttttatt 
tcttacctca 
tttactttgt 
ggaaatactt 
ctgaaggtga 
aaaaagtaaa 
tttctcagtg 
ctactaaaag 
gaaattcctg 
ttgcatcatg 
ttctgctctc 
gtttgaagcc 
acacattctg 



1020 

1080 

1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3033 



60 
120 
137 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
805 
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tctgttgtga 
gatgaaccaa 
gatataaatg 
tttttttttt 
ttataagaca 
acacaaaata 
aaagacaaat 
caaaatgaca 
ccaatgaaag 

gtggtgtctc 

gcccaggagt 

aaaaaattag 

cagaagaatt 

ctgcacccca 

acaaaacaaa 

taaagacaca 

aacaaaaaga 

tgaaaagaga 

taacaattct 

tattagagcc 

accccacttt 

tctgcaccat 

tagaatacac 

gtcacaaaac 

accacagact 

gagacaacca 

taataaattg 

tgttcaggtt 

cattcttgta 

tgtgaacagg 

gaactcagga 

aatccaactc 

gaggccagtg 

gaaagacaca 

gaaatcctca 

catgaccaag 

ggtttaatgt 

caggatctca 

ctttatccat 

tgctgcaata 

ataaataccc 

ctcaaaaata 

ctactgttaa 

tatgtagcag 



gctgtgtgaa tcagagagtt 
tcaaaaataa taactataat 



gaaacaaaaa 
tgcttttttg 
ctatatgcaa 
aaaaggagaa 
aggaagaaaa 
ggaataagtc 
acagggagtg 
acacctataa 
tcgagaccag 
ctgaacatgg 
gcttgaactt 
gcctaggtga 
aaacccttag 
tatagactga 
agcagaagct 
gaaagaaggt 
aaatatatat 
aaagagagag 
cagcattgga 
aggtcaaatg 
attcttctcc 
aaatcttaaa 
aagaaaaaaa 
ataccacaaa 
gaaaacctga 
ttgtattttt 
tttcttcatg 
ccaataacaa 
tctgatggct 
aaattattaa 
ttttttctga 
ttaaaaagaa 
acaacaaatt 
tggaatttgt 
tgtccaatga 
ttctttttta 
tcatctgtta 
aacatgggag 
agtagtggga 
tagttagaaa 
tgataattta 
agagaaatga 



aagttaaaag 

tttgtttatg 

gcctcatggt 

attaaaacac 

taagaaagag 

ctcataaata 

gctgaatgta 

tcccagcatt 

cctggccaac 

tggcacatgc 

gggaggtgga 

cagaataaga 

acccaatgat 

aggtaaaggg 

acatttatat 

cattatatag 

tcacccaata 

atagacagac 

cagatcatcc 

gacctagtag 

tcagcacatg 

atttagaaaa 

gaagtcccaa 

aaattaaagg 

acaaaataga 

tcatagtacc 

gtttttgtat 

gtaatgagac 

tcactgatga 

aaaaatagag 

ttgaatctcc 

agaaaactgt 

agcaaactga 

cctagagatt 

acataatgtc 

tggctaagta 

gacacctaag 

tgtaaatatt 

ttgctggatc 

aaatgaatat 

ttatatatta 



tgatttagac 
gacttttcaa 
tggggagatg 
taatcagtgt 
aacctccaat 
accaccagag 
aaggccatca 
ataacattga 
ttttaaaaaa 
ttgggagact 
atggcaaaac 
ctgtggttcc 
ggttgcagtg 
ctctgcctca 
tcattgccta 
atggaaaaat 
cagacaaaat 
tgataaaggg 
ctggagtact 
ccccatacaa 
agacagaaaa 
atatttacag 
gataattctc 
aaagtgaaat 
ataaatacaa 
atcattagaa 
taattcctag 
atgaagaaat 
ttcttcatgg 
agaagccata 
attttgccaa 
gtggacagaa 
cattatattt 
aggccaatat 
attcaagaac 
caagtgtggt 
ctccagctcc 
gtactccatt 
ttgcttccaa 
ttgttgacat 
atatggggga 
gatttagtat 
taaaataact 



taaatgtttg aag 



atatatttag 

gatatagaca 

aagtctgatt 

taccagttta 

ctaaaacata 

aaaatcacct 

aataatcaga 

atgtaaatgg 

aatattacac 

gagccgggtg 

cctgtctcta 

agctactaga 

agctaagatt 

aaaaaaaaaa 

caagaagtat 

attctatgcc 

agactgcaag 

gtccatttag 

caggtatata 

taataactgg 

ttaacaaaca 

aacatttgat 

aaggatatac 

aatatcaaac 

tctgagataa 

gatactatga 

aaacatacaa 

acaagaattg 

aaccatgaag 

ctaaaaagta 

atatttaaaa 

tctttccaaa 

taatcacata 

ctctgatgaa 

acattaaaac 

taggtatgtg 

atccatgttc 

gtgtataagt 

atcttagcta 

actgatttca 

aaatggagat 

tcgatagcac 

aaaatagtat 



aaagaggaaa 

atacagttaa 

ttttggtttt 

aaat^aatggg 

caacaaatac 

acattaaaag 

aaatgaataa 

actaagctct 

cgagctgtgc 

gatcacttga 

ctaaaaatac 

gaggctgagg 

gatggagcca 

gcaaaacaaa 

gcttcacctt 

tatggaaaca 

acaaaaacta 

caagagcatt 

aagcaaatat 

agacttcaac 

tcaaatttca 

ccaacagctg 

caaatgctag 

gttttctctc 

aaaaggagac 

aactatatgc 

catactggtc 

tttctagaac 

aaatacaaaa 

tcccagaaaa 

aactaatacc 

tgtattctat 

taaaaccaga 

cattgatgca 

aatcattcat 

cagatcaatg 

ttgcaaatga 

gccatatttt 

ttgtgaatag 

tttcctttgg 

ggctaacggg 

aataggatga 

aaatgggatg 



<210> 13 

<211> 3033 

<212> DNA 

<213> Homo sapiens 

<400> 13 

actaacataa agctgaaggt gaataaaaaa 
caaataccac ataaaaagta aatatactta 
acaaaatgct catttctcag tgtttgacag 
tgactagata aactactaaa agttgttaat 
ttatctatta tagaaattcc tgtgcccatt 
aatatagttt aattgcatca tgaaaataat 
aaaactgatt ctttctgctc tctctatata 
gatcaccaaa ttgtttgaag cctaggtttc 
tacagttcac acacacattc tggggattta 
aagattgatg atttggggga attagagcta 
tctgttgtga gctgtgtgaa tcagagagtt 
gatgaaccaa tcaaaaataa taactataat 
gatataaatg gaaacaaaaa aagttaaaag 
tttttttttt tgcttttttg tttgtttatg 
ttataagaca ctatatgcaa gcctcatggt 
acacaaaata aaaaggagaa attaaaacac 



atcagggtta 
agttcccagc 
acttaacagt 
ttttgcaatg 
taagaacttt 
caataataca 
tagactgatt 
tgagggatgg 
atacatcctt 
ccacacccca 
tgatttagac 
gacttttcaa 
tggggagatg 
taatcagtgt 
aacctccaat 



gccaaacaaa 
aaaatctgaa 
ttgagccaat 
tatatttctg 
gagcatttta 
atttatttgg 
ttatactaat 
aaaatgatgt 
tacaagtgca 
gagggtggta 
atatatttag 
gatatagaca 
aagtctgatt 
taccagttta 
ctaaaacata 



ttttcatggt 
ttgaacgtag 
aaaaatgtac 
aaaagaaagt 
attgtttaat 
tttatttaaa 
gttgcctaaa 
cacaactatt 
ggaaaggtgg 
tggtatgttg 
aaagaggaaa 
atacagttaa 
ttttggtttt 
aaataatggg 
caacaaatac 



660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3213 



accaccagag aaaatcacct acattaaaag 
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aaaataatca 
tctatatata 
taggtttctg 
gggatttaat 
tagagctacc 
agagagtttg 
actataatga 
gttaaaagtg 
tgtttatgta 
ctcatggtaa 
taaaacacac 



ataatacaat 
gactgatttt 
agggatggaa 
acatccttta 



acaccccaga 
atttagacat 
cttttcaaga 
gggagatgaa 
atcagtgtta 
ccrtccaatct 



agaaagagaa 

cataaataat 

tgaatgtatt 

ccagcatttt 

tggccaacat 

gcacatgcct 

gaggtggagg 

gaataagact 

ccaatgattc 

gtaaagggat 

atttatatca 

ttatatagtg 

acccaatact 

agacagaccc 

gatcatccag 

ccrtagtagat 

agcacatgga 

ttagaaaaaa 

agtcccaaat 

attaaaggat 

aaaatagata 

atagtaccat 

ttttgtattt 

aatgagacag 

actgatgaat 

aaatagaggt 

gaatctccca 

aaaactgtag 

caaactgaat 

tagagattca 

ataatgtcct 

gctaagtagt 

cacctaagtt 

taaatatttt 

gctggatcat 

atgaatatga 

atatattata 

aatgtttgaa 

aaatattcac 



caccagagaa 
ggccatcaaa 
aacattgaat 
ttaaaaaaaa 
gggagactga 
ggcaaaaccc 
gtggttccag 
ttgcagtgag 
ctgcctcaaa 
attgcctaca 
ggaaaaatat 
gacaaaatag 
ataaaggggt 
ggagtactca 
ccatacaata 
acagaaaatt 
atttacagaa 
taattctcaa 
agtgaaataa 
aaatacaatc 
cattagaaga 
attcctagaa 
gaagaaatac 
cttcatggaa 
aagccatact 
tttgccaaat 
ggacagaatc 
ttatatttta 
gccaatatct 
tcaagaacac 
agtgtggtta 
ccagctccat 
actccattgt 
gcttccaaat 
gttgacatac 
atgggggaaa 
tttagtattc 
aaataactaa 
gcattggata 
tattccataa 



ttatttggtt 
atactaatgt 
aatgatgtca 
caagtgcagg 
gggtggtatg 
atatttagaa 
tatagacaat 
gtctgatttt 
ccagtttaaa 
aaaacataca 
aatcacctac 
taatcagaaa 
gtaaatggac 
tattacaccg 
gccgggtgga 
tgtctctact 
ctactagaga 
ctaagattga 
aaaaaaaagc 
agaagtatgc 
tctatgccta 
actgcaagac 
ccatttagca 
ggtatataaa 
ataactggag 
aacaaacatc 
catttgatcc 
ggatatacca 
tatcaaacgt 
tgagataaaa 
tactatgaaa 
acatacaaca 
aagaattgtt 
ccatgaagaa 
aaaaagtatc 
atttaaaaaa 
tttccaaatg 
atcacatata 
ctgatgaaca 
attaaaacaa 
ggtatgtgca 
ccatgttctt 
gtataagtgc 
cttagctatt 
tgatttcatt 
atggagatgg 
gatagcacaa 
aatagtataa 
ctccatcacc 



caca 



tatttaaaaa 
tgcctaaaga 
caactattta 
aaaggtggaa 
gtatgttgtc 
agaggaaaga 
acagttaaga 
ttggtttttt 
ataatgggtt 
acaaatacac 
attaaaagaa 
atgaataaca 
taagctctcc 
agctgtgcgt 
tcacttgagc 
aaaaatacaa 
ggctgaggca 
tggagccact 
aaaacaaaac 
ttcaccttta 
tggaaacaaa 
aaaaactatg 
agagcattta 
gcaaatatta 
acttcaacac 
aaatttcatc 
aacagctgta 
aatgctaggt 
tttctctcac 
aaggagacga 
ctatatgcta 
tactggtctg 
tctagaacca 
atacaaaatg 
ccagaaaaga 
ctaataccaa 
tattctatga 
aaaccagaga 
ttgatgcaga 
tcattcatca 
gatcaatggg 
gcaaatgaca 
catattttct 
gtgaatagtg 
tcctttggat 
ctaacgggct 
taggatgact 
atgggatgta 
tgctgtgata 



aactgattct 
tcaccaaatt 
cagttcacac 
gattgatgat 
tgttgtgagc 
tgaaccaatc 
tataaatgga 
tttttttttg 
ataagacact 
acaaaataaa 
agacaaatag 
aaatgacagg 
aatgaaagac 
ggtgtctcac 
ccaggagttc 
aaaattagct 
gaagaattgc 
gcaccccagc 
aaaacaaaaa 
aagacacata 
caaaaagaag 
aaaagagaga 
acaattctaa 
ttagagccaa 
cccactttca 
tgcaccatag 
gaatacacat 
cacaaaacaa 
cacagactaa 
gacaaccaat 
ataaattgga 
ttcaggtttt 
ttcttgtatt 
tgaacaggcc 
actcaggatc 
tccaactcaa 
ggccagtgtt 
aagacacatt 
aatcctcaac 
tgaccaagtg 
tttaatgttg 
ggatctcatt 
ttatccattc 
ctgcaataaa 
aaatacccag 
caaaaatata 
actgttaatg 
tgtagcagag 
ttatgaatgt 



ttctgctctc 
gtttgaagcc 
acacattctg 
ttgggggaat 
tgtgtgaatc 
aaaaataata 
aacaaaaaaa 
cttttttgtt 
atatgcaagc 
aaggagaaat 
gaagaaaata 
aataagtcct 
agggagtggc 
acctataatc 
gagaccagcc 
gaacatggtg 
ttgaacttgg 
ctaggtgaca 
acccttagac 
tagactgaag- 
cagaagctac 
aagaaggtca 
atatatattc 
agagagagat 
gcattggaca 
gtcaaatgga 
tcttctcctc 
atcttaaaat 
gaaaaaaaga 
accacaaaaa 
aaacctgaac 
gtattttttc 
tcttcatggt 
aataacaagt 
tgatggcttc 
attattaaaa 
ttttctgatt 
aaaaagaaag 
aacaaattag 
gaatttgtcc 
tccaatgaac 
cttttttatg 
atctgttaga 
catgggagtg 
tagtgggatt 
gttagaaaaa 
ataatttatt 
agaaatgata 
ctgcctatat 



<210> 12 

<211> 3213 

<212> DNA 

<213> Homo sapiens 

<400> 12 

actaacataa agctgaaggt gaataaaaaa 
caaataccac ataaaaagta aatatactta 
acaaaatgct catttctcag tgtttgacag 
tgactagata aactactaaa agttgttaat 
ttatctatta tagaaattcc tgtgcccatt 
aatatagttt aattgcatca tgaaaataat 
aaaactgatt ctttctgctc tctctatata 
gatcaccaaa ttgtttgaag cctaggtttc 
tacagttcac acacacattc tggggattta 
aagattgatg atttggggga attagagcta 



atcagggtta 
agttcccagc 
acttaacagt 
ttttgcaatg 
taagaacttt 
caataataca 
tagactgatt 
tgagggatgg 
atacatcctt 
ccacacccca 



gccaaacaaa 
aaaatctgaa 
ttgagccaat 
tatatttctg 
gagcatttta 
atttatttgg 
ttatactaat 
aaaatgatgt 
tacaagtgca 
gagggtggta 



ttttcatggt 
ttgaacgtag 
aaaaatgtac 
aaaagaaagt 
attgtttaat 
tttatttaaa 
gttgcctaaa 
cacaactatt 
ggaaaggtgg 
tggtatgttg 



660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3564 
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ctcatggtaa 
taaaacacac 
agaaagagaa 
cataaataat 
tgaatgtatt 
ccagcatttt: 
tggccaacat 
gcacatgcct 
gaggtggagg 
gaataagact 
ccaatgattc 
gtaaagggat 
atttatatca 
ttatatagtg 
acccaatact 
agacagaccc 
gatcatccag 
cctagtagat 
agcacatgga 
ttagaaaaaa 
agtcccaaat 
attaaaggat 
aaaatagata 
atagtaccat 
ttttgtattt 
aatgagacag 
actgatgaat 
aaatagaggt 
gaatctccca 
aaaactgtag 
caaactgaat 
tagagattca 
ataatgtcct 
gctaagtagt 
cacctaagtt 
taaatatttt 
gctggatcat 
atgaatatga 
atatattata 
aatgtttgaa 
aaatattcac 
aaagttgctc 
atttttaatt 
taatgtatag 
taaaatataa 
ggaaggaaac 
gtgtat aaag 
acgagt at gt 
taaggt tta.j 
ttgacctt tg 



cctccaatct 
caccagagaa 
ggccatcaaa 
aacattgaat 
ttaaaaaaaa 
gggagactga 
ggcaaaaccc 
gtggttccag 
ttgcagtgag 
ctgcctcaaa 
attgcctaca 
ggaaaaatat 
gacaaaatag 
ataaaggggt 
ggagtactca 
ccatacaata 
acagaaaatt 
atttacagaa 
taattctcaa 
agtgaaataa 
aaatacaatc 
cattagaaga 
attcctagaa 
gaagaaatac 
cttcatggaa 
aagccatact 
tttgccaaat 
ggacagaatc 
ttatatttta 
gccaatatct 
tcaagaacac 
agtgtggtta 
ccagctccat 
actccattgt 
gcttccaaat 
gttgacatac 
atgggggaaa 
tttagtattc 
aaataactaa 
gcattggata 
tattccataa 
taagaatata 
gttttatcat 
aatagagata 
tgggagattc 
aataatgaaa 
aaagcaaaaa 
tccctattta 
gctcacccat 



aaaacataca 
aatcacctac 
taatcagaaa 
gtaaatggac 
tattacaccg 
gccgggtgga 
tgtctctact 
ctactagaga 
ctaagattga 
aaaaaaaagc 
agaagtatgc 
tctatgccta 
actgcaagac 
ccatttagca 
ggtatataaa 
ataactggag 
aacaaacatc 
catttgatcc 
ggatatacca 
tatcaaacgt 
tgagataaaa 
tactatgaaa 
acatacaaca 
aagaattgtt 
ccatgaagaa 
aaaaagtatc 
atttaaaaaa 
tttccaaatg 
atcacatata 
ctgatgaaca 
attaaaacaa 
ggtatgtgca 
ccatgttctt 
gtataagtgc 
cttagctatt 
tgatttcatt 
atggagatgg 
gatagcacaa 
aatagtataa 
ctccatcacc 
cacagcgcct 
gttatcaagt 
tctttgcaat 
tacataggat 
aatcagaaaa 
aaaatgtggt 
gagaagtaga 
aggctaggca 
ttcaaccagt 



- 4 - 

acaaatacac 
attaaaagaa 
atgaataaca 
taagctctcc 
agctgtgcgt 
tcacttgagc 
aaaaatacaa 
ggctgaggca 
tggagccact 
aaaacaaaac 
ttcaccttta 
tggaaacaaa 
aaaaactatg 
agagcattta 
gcaaatatta 
acttcaacac 
aaatttcatc 
aacagctgta 
aatgctaggt 
tttctctcac 



aaggagacga 
ctatatgcta 
tactggtctg 
tctagaacca 
atacaaaatg 
ccagaaaaga 
ctaataccaa 
tattctatga 
aaaccagaga 
ttgatgcaga 
tcattcatca 
gatcaatggg 
gcaaatgaca 
catattttct 
gtgaatagtg 
tcctttggat 
ctaacgggct 
taggatgact 
atgggatgta 
tgctgtgata 
cttatgtacc 
taagtaaaat 
aataaaacat 
atgtaaatag 
aagtttctaa 
gagaaaaaca 
aagtaacaca 
caaagcaagg 
ctagcagcat 



acaaaataaa 
agacaaatag 
aaatgacagg 
aatgaaagac 
ggtgtctcac 
ccaggagttc 
aaaattagct 
gaagaattgc 
gcaccccagc 
aaaacaaaaa 
aagacacata 
caaaaagaag 
aaaagagaga 
acaattctaa 
ttagagccaa 
cccactttca 
tgcaccatag 
gaatacacat 
cacaaaacaa 
cacagactaa 
gacaaccaat 
ataaattgga 
ttcaggtttt 
ttcttgtatt 
tgaacaggcc 
actcaggatc 
tccaactcaa 
ggccagtgtt 
aagacacatt 
aatcctcaac 
tgaccaagtg 
tttaatgttg 
ggatctcatt 
ttatccattc 
ctgcaataaa 
aaatacccag 
caaaaatata 
actgttaatg 
tgtagcagag 
ttatgaatgt 
cacaaaaatc 
gtcaatagcc 
taactttata 
atacacagtg 
aaaggctctg 
gctgaaaacc 
ggggcatttg 
tcttcagaga 
ctgcaacatc 



aaggagaaat 

gaagaaaata 

aataagtcct 

agggagtggc 

acctitaatc 

gagaccagcc 

gaacatggtg 

ttgaacttgg 

ctaggtgaca 

acccttagac 

tagactgaag 

cagaagctac 

aagaaggtca 

atatatattc 

agagagagat 

gcattggaca 

gtcaaatgga 

tcttctcctc 

atcttaaaat 



gaaaaaaaga 
accacaaaaa 
aaacctgaac 
gtattttttc 
tcttcatggt 
aataacaagt 
tgatggcttc 
attattaaaa 
ttttctgatt 
aaaaagaaag 
aacaaattag 
gaatttgtcc 
tccaatgaac 
cttttttatg 
atctgttaga 
catgggagtg 
tagtgggatt 
gttagaaaaa 
ataatttatt 
agaaatgata 
ctgcctatat 
tattttcaaa 
ttttaattta 
ctttttaatt 
tatatgtgat 
gggtaaaaga 
catgtaaaga 
gaaaatgtaa 
acctggagcc 
tacaatggcc 



< ; :o » 11 
<::\> 3564 
* ; : ; » dna 

* : : J ► hocijo sapiens 



* 400 
aagctltt at 
ttcattt at c 
tgtagacat a 
acaaaaag«a 
tatattccat 
ataaaaaaat 
tatacttaag 
tttgacagac 
ttgttaattt 
tgcccattta 



• 11 

aggtgtaaat 
tcaagatgtt 
catttttggc 
aaatgagaaa 
ttgtttcatc 
cagggttagc 
ttcccagcaa 
ttaacagttt 
ttgcaatgta 
agaactttga 



tttccactta 
ttctaatttc 
cctatgcatt 
gaaatatatt 
atattcatat 
caaacaaatt 
aatctgaatt 
gagccaataa 
tatttctgaa 
gcattttaat 



gtactgcttt 
tcttgacttc 
gggatgcaaa 
tggtcttgtg 
atccctttac 
ttcatggtca 
gaacgtagac 
aaatgtactg 
aagaaagttt 
tgtttaataa 



tgtaatgttg 
cttcttaaat 
accagactaa 
agcactatat 
taacataaag 
aataccacat 
aaaatgctca 
actagataaa 
atctattata 
tatagtttaa 



tctttttatt 
tcttacctca 
tttactttgt 
ggaaatactt 
ctgaaggtga 
aaaaagtaaa 
tttctcagtg 
ctactaaaag 
gaaattcctg 
ttgcatcatg 
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ATTAAAGGAT CATTAGAAGA "ACTATGAAA 
AAAATAGATA ATTCCTAGAA ACATACAACA 
ATAGTACCAT GAAGAAATAC AAGAATTGTT 
TTTTGTATTT CTTCATGGAA CCATGAAGAA 
AATGAGACAG AAGCCATACT A AAAAGTATC 
ACTGATGAAT TTTGCCAAAT ATTTAAAAAA 
AAATAGAGGT GGACAGAATC TTTCCAAATG 
GAATCTCCCA TTATATTTTA ATCACATATA 
AAAACTGTAG GCCAATATCT CTGATGAACA 
CAAACTGAAT TCAAGAACAC ATTAAAACAA 
TAGAGATTCA AGTGTGGTTA GGTATG7GCA 
ATAATGTCCT CCAGCTCCAT CCATGTTCTT 
GCTAAGTAGT ACTCCATTGT GTATAAGTGC 
CACCTAAGTT GCTTCCAAAT CTTAGCTATT 
TAAATATTTT GTTGACATAC TGATTTCATT 
GCTGGATCAT A 



CTATATGCTA ATAAATTGGA AAACCTGAAC 
TACTGGTCTG TTCAGGTTTT GTATTTTTTC 
TCTAGAACCA TTCTTGTATT TCTTCATGGT 
ATACAAAATG TGAACAGGCC AATAACAAGT 
CCAGAAAAGA ACTCAGGATC TGATGGCTTC 
CTAATACCAA TCCAACTCAA ATTATT AAAA 
TATTCTATGA GGCCAGTGTT TTTTCTGATT 
AAACCAGAGA AAGACACATT AAAAAGAAAG 
T7GATGCAGA AATCCTCAAC AACAAATTAG 
TCATTCATCA TGACCAAGTG GAATTTGTCC 
GATCAATGGG TTTAATGTTG TCCAATGAAC 
GCAAATGACA GGATCTCATT CTTTTTTATG 
CATATTTTCT TTATCCATTC ATCTGTTAGA 
GTGAATAGTG CTGCAATAAA CATGGGAGT^G 
TCCTTTGGAT AAATACCCAG TAGTGGGATT 



FIG. 7B 
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A7AAAAAA AT CAGGG77AGC 
TATACTTAAG TTCCCAGCAA 
TTTGACAGAC 7TAACAG777 
TTGTTAATT7 7TGCAATG7A 
TGCCCATT7A AGAACTT7GA 
AAAATAATCA ATAATACAA7 
TCTATATA7A GAC7GA7777 
TAGGTT7C7G AGGGATGGAA 
GGGA7T7AAT ACATCCTT7A 
TAGAGCTACC ACACCCCAGA 
AGAGAG777G A7TTAGACAT 
ACTA7AATGA CT7TTCAAGA 
GTTAAAAG7G GGGAGATGAA 
TGT7TATG7A ATCAGTG7TA 
CTCATGG7AA CCTCCAATC7 
TAAAACACAC CACCAGAGAA 
AGAAAGAGAA GGCCATCAAA 
CATAAATAA7 AACA7TGAAT 
TGAA7GTAT7 TTAAAAAAAA 
CCAGCATTT7 GGGAGACTGA 
TGGCCAACAT GGCAAAACCC 
GCACATGCC7 GTGG77CCAG 
GAGGTGGAGG 7TGCAGTGAG 
GAATAAGAC7 CTGCCTCAAA 
CCAATGA77C ATTGCCTACA 
GTAAAGGGA7 GGAAAAATA7 
AT7TA7A7CA GACAAAATAG 
7TATATAG7G ATAAAGGGG7 
ACCCAATAC7 GCACTAC7CA 
AGACAGACCC CCA7ACAATA 
GATCATCC AG ACAGAAAA77 
CCTAGTAGA7 ATT7ACAGAA 
AGCACATGGA TAAT7C7CAA 
TTAGAAAAAA ACTGAAATAA 
AGTCCCAAAT AAATACAA7C 



CAAACAAA77 77CA7GG7CA 
AA7C7GAATT GAACG7AGAC 
GAGCCAATAA AAATG7AC7G 
7A777CTGAA AAGAAAG777 
GCA7777AAT. 7G7TTAA7AA 
77AT77GGT7 TA7T7AAAAA 
A7AC7AATG7 7GCC7AAAGA 
AA7GA7G7CA CAAC7A777A 
CAAG7GCAGG AAAGG7GGAA 
GGG7GG7ATG G7A7G7TG7C 
ATA777AGAA AGAGGAAAGA 
TA7AGACAA7 ACAG7TAAGA 
GTC7GA7TT7 T7GG77T77T 
CCAG777AAA A7AA7GGG77 
AAAACA7ACA ACAAA7ACAC 
AA7CACC7AC A7TAAAAGAA 
7AATCAGAAA A7GAA7AACA 
G7AAATGGAC 7AAGC7CTCC 
7A77ACACCG AGC7G7GCG7 
GCCGGG7GGA 7CAC77GAGC 
7G7C7C7ACT AAAAA7ACAA 
C7AC7AGAGA GGC7GAGGCA 
CTAAGATTGA TGGAGCCACT 
AAAAAAAAGC AAAACAAAAC 
AGAAG7ATGC 7TCACCTTTA 
TCTA7GCCTA 7GGAAACAAA 
AC7GCAAGAC A AAAAC7A7G 
CCA777AGCA AGAGCATTTA 
GG7A7ATAAA GC AAA7AT7A 
A7AACTGGAG ACT7CAACAC 
AACAAACATC AAATT7CATC 
CA7T7GATCC AACAGC7G7A 
GGATATACCA A A7GC7AGG7 
TAT:AAACG7 777CTCTCAC 
7GAGATAAAA AAGGAGACGA 

Pitt 7A 



TAACATAAAG C7GAAGGTGA 
AATACCACA7 AAAAAG7AAA 
AAAATGC7CA 77TC7CAG7G 
AC7AGATAAA C7AC7AAAAG 
ATCTA77A7A GAAAT7CC7G 
7ATAG7TTAA 7TGCATCATG 
AACTGAT7C7 77C7GC7CTC 
7CACC AAA77 G77TGAAGCC 
CAG77CACAC ACACA77C7G 
GAT7GATGA7 77GGGGGAAT 
7G77GTGAGC 7G7G7GAATC 
TGAACCAATC AAAAATAATA 
TATAAA7GGA AACAAAAAAA 
77T77T7T7G CT777TTGTT 
ATAAGACAC7 A7ATGCAAGC 
ACAAAATAAA AAGGAGAAAT 
AGACAAA7AG GAAGAAAATA 
AAATGACAGG AATAAGTCCT 
AATGAAAGAC AGGGAG7GGC 
GG7G7C7CAC ACCTATAA7C 
CCAGGAGT7C GAGACCAGCC 
AAAA7TAGCT GAACATGG7G 
GAAGAA7TGC 77GAAC7TGG 
GCACCCCAGC C7AGGTGACA 
AAAACAAAAA ACCCT7AGAC 
AAGACACATA 7AGAC7GAAG 
CAAAAAGAAG CAGAAGCTAC 
AAAAGAGAGA AAGAAGG7CA 
ACAA77CTAA ATATATATTC 
T7AGAGCCAA AGAGAGAGA7 
CCCACT77CA GCA7T£GACA 
TGCACCATAG GTCAAATGGA 
GAATACACA7 7CTTC7CCTC 
C AC AAAACAA A7C7TAAAAT 
CACAGACTAA GAAAAAAAGA 
GACAACCAA7 ACCAC AAAAA 
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<400> 8 

cattggatac tccatcacct gctgtgatat tatgaatgtc tgcctatata aatattcact 
attccataac aca 

<210> 9 

<211> 1806 

<212> DNA 

<213> Homo sapiens 



60 
73 



<400> 9 

cattggatac tccatcacct gctgtgatat tatgaatgtc 
attccataac acagcgcctc ttatgtaccc acaaaaatct 
aagaatatag ttatcaagtt aagtaaaatg tcaatagcct 
ttttatcatt ctttgcaata ataaaacatt aactttatac 
atagagatat acataggata tgtaaataga tacacagtgt 
gggagattca atcagaaaaa agtttctaaa aaggctctgg 
ataatgaaaa aaatgtggtg agaaaaacag ctgaaaaccc 
aagcaaaaag agaagtagaa agtaacacag gggcatttgg 
ccctatttaa ggctaggcac aaagcaaggt cttcagagaa 
ctcacccatt tcaaccagtc tagcagcatc tgcaacatct 
tttactggtg gccctcctgg tgctcagctg caagtcaagc 
gcctcaaacc cacagcctgg gtagcaggag gaccttgatg 
aatctctctt ttctcctgct tgaaggacag acatgacttt 
tggcaaccag ttccaaaagg ctgaaaccat ccctgtcctc 
cttcaatctc ttcagcacaa aggactcatc tgctgcttgg 
attctacact gaactctacc agcagctgaa tgacctggaa 
gggggtgaca gagactcccc tgatgaagga ggactccatt 
ccaaagaatc actctctatc tgaaagagaa gaaatacagc 
cagagcagaa atcatgagat ctttttcttt gtcaacaaac 
taaggaatga aaactggttc aacatggaaa tgattttcat 
ctttttatga tctgccattt caaagactca tgtttctgct 
atcttttcaa atgtttttag gagtattaat caacattgta 
tcccttacag aggaccatgc tgactgatcc attatctatt 
tttatttaac tatttataaa acaacttatt tttgttcata 
cacagtggtt aatgtaataa aatgtgttct ttgtatttgg 
cattgaactt ttgctatgga acttttgtac ttgtttattc 
ctaattgtgc aacctgatta cagaataact ggtacacttc 
ttcaagatat aagtaaaaat aaactttctg taaaccaagt 
acagggtgaa cctaacaaat acaattctgc tctcttgtgt 
aaactaaaaa tggtaatcat acttaattat cagttatggt 
ggaacg 

<210> 10 

<211> 4090 

<212> DNA 

<213> Homo sapiens 



tgcctatata 

attttcaaaa 

tttaatttaa 

tttttaattt 

atatgtgatt 

ggtaaaagag 

atgtaaagag 

aaaatgtaaa 

cctggagcct 

acaatggcct 

tgctctgtgg 

ctcctggcac 

ggatttcccc 

catgagatga 

gatgagaccc 

gcctgtgtga 

ctggctgtga 

ccttgtgcct 

ttgcaagaaa 

tgattcgtat 

atgaccatga 

ttcagctctt 

taaatatttt 

ttatgtcatg 

taaatttatt 

tttaaaatga 

atttgtccat 

tgtatgttgt 

atttgatttt 

aaatggtatg 



aatattcact 
aagttgctct 
tttttaattg 
aatgtataga 
aaaatataat 
gaaggaaaca 
tgtataaaga 
cgagtatgtt 
aaggtttagg 
tgacctttgc 
gctgtgatct 
agatgaggag 
aggaggagtt 
tccagcagat 
tcctagacaa 
tacagggggt 
ggaaatactt 
gggaggttgt 
gtttaagaag 
gccagctcac 
cacgatttaa 
aaggcactag 
taaaatatta 
tgcacctttg 
ttgtgttgtt 
aattccaagc 
caatattata 
actcaagata 
tgtatgaaaa 
aagagaagaa 



<400> 10 

aagcttttat aggtgtaaat tttccactta 
ttcatttatc tcaagatgtt ttctaatttc 
tgtagacata catttttggc cctatgcatt 
acaaaaagaa aaatgagaaa gaaatatatt 
tatattccat ttgtttcatc atattcatat 
ataaaaaaat cagggttagc caaacaaatt 
tatacttaag ttcccagcaa aatctgaatt 
tttgacagac ttaacagttt gagccaataa 
ttgttaattt ttgcaatgta tatttctgaa 
tgcccattta agaactttga gcattttaat 
aaaataatca ataatacaat ttatttggtt 
tctatatata gactgatttt atactaatgt 
taggtttctg agggatggaa aatgatgtca 
gggatttaat acatccttta caagtgcagg 
tagagctacc acaccccaga gggtggtatg 
agagagtttg atttagacat atatttagaa 
actataatga cttttcaaga tatagacaat 
gttaaaagtg gggagatgaa gtctgatttt 
tgtttatgta atcagtgtta ccagtttaaa 



gtactgcttt 
tcttgacttc 
gggatgcaaa 
tggtcttgtg 
atccctttac 
ttcatggtca 
gaacgtagac 
aaatgtactg 
aagaaagttt 
tgtttaataa 
tatttaaaaa 
tgcctaaaga 
caactattta 
aaaggtggaa 
gtatgttgtc 
agaggaaaga 
acagttaaga 
ttggtttttt 
ataatgggtt 



tgtaatgttg 
cttcttaaat 
accagactaa 
agcactatat 
taacataaag 
aataccacat 
aaaatgctca 
actagataaa 
atctattata 
tatagtttaa 
aactgattct 
tcaccaaatt 
cagttcacac 
gattgatgat 
tgttgtgagc 
tgaaccaatc 
tataaatgga 
tttttttttg 
ataagacact 



tctttttatt 
tcttacctca 
tttactttgt 
ggaaatactt 
ctgaaggtga 
aaaaagtaaa 
tttctcagtg 
ctactaaaag 
gaaattcctg 
ttgcatcatg 
ttctgctctc 
gtttgaagcc 
acacattctg 
ttgggggaat 
tgtgtgaatc 
aaaaataata 
aacaaaaaaa 
cttttttgtt 
atatgcaagc 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1806 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 



Leu 


Phe 


Ser 


Cys 


Leu 




50 






Glu 


Phe 


Gly 


Asn 


Gin 


65 








Glu 


Met 


lie 


Gin 


Gin 
85 


Ala 


Ala 


Trp 


Asp 
100 


Glu 


Gin 


Gin 


Leu 


Asn 


Asp 






115 




Thr 


Glu 
130 


Thr 


Pro 


Leu 


Tyr 


Phe 


Gin 


Arg 


He 


145 








Cys 


Ala 


Trp 


Glu 


Val 
165 


Ser 


Thr 


Asn 


Leu 


Gin 
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tys Asp Arg His Asp Phe Gly Phe Pro Gin Glu 

55 60 
Phe Gin Lys Ala Glu Thr He Pro Val Leu His 
70 75 8 o 

He Phe Asn Leu Phe Ser Thr Lys Asp Ser Ser 

90 95 
Thr Leu Leu Asp Lys Phe Tyr Thr Glu Leu Tvr 

105 ~ no 

Leu Glu Ala Cys Val He Gin Gly Val Gly Val 

120 125 
Met Lys Glu Asp Ser He Leu Ala Val Arq Lvs 

135 140 ^ 

Thr Leu Tyr Leu Lys Glu Lys Lys Tyr Ser Pro 
15 ° 155 160 

Val Arg Ala Glu He Met Arg Ser Phe Ser Leu 

170 ~ 175 

Glu Ser Leu Arg Ser Lys Glu 
180 185 

<210> 3 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<400> 3 

agtttctaaa aaggctctgg ggta 24 

<210> 4 

<211> 19 

<212> DNA 

<213> Homo sapiens 

<400> 4 

gcccacagag cagcttgac 19 

<210> 5 

<211> 25 

<212> DNA 

<213> Homo sapiens 

<400> 5 * 
aaagactcat gtttctgcta tgacc 25 

<210> 6 

<211> 24 

<212> DNA 

<213> Homo sapiens 

<400> 6 

ggtgcacatg acataatatg aaca 24 

<210> 7 

<211> 278 

<212> DNA 

<213> Homo sapiens 

<400> 7 

aagcttttat aggtgtaaat tttccactta gtactgcttt tgtaatgttg tctttttatt 60 

ttcatttatc tcaagatgtt ttctaatttc tcttgacttc cttcttaaat tcttacctca 120 

tgtagacata catttttggc cctatgcatt gggatgcaaa accagactaa tttactttgt 180 

acaaaaagaa aaatgagaaa gaaatatatt tggtcttgtg agcactatat ggaaatactt 240 

tatattccat ttgtttcatc atattcatat atcccttt 278 

<210> 8 

<211> 73 

<212> DNA 

<213> Homo sapiens 
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SEQUENCE LISTING 
<110> Transkaryotic Therapies Inc. 

<120> GENOMIC SEQUENCES FOR PROTEIN PRODUCTION AND DELIVERY 
<130> 07236/018WO1 

<150> US 60/086,555 
<151> 1998-05-21 

<150> US 60/084,648 
<151> 1998-05-07 

<160> 19 

<170> FastSEQ for Windows Version 3.0 

<210> 1 

<211> 1733 

<212> DNA 

<213> Homo sapiens 

<400> 1 

gcgcctctta tgtacccaca aaaatctatt ttcaaaaaag ttgctctaag aatatagtta 60 

tcaagttaag taaaatgtca atagcctttt aatttaattt ttaattgttt tatcattctt 120 

tgcaataata aaacattaac tttatacttt ttaatttaat gtatagaata gagatataca 180 

taggatatgt aaatagatac acagtgtata tgtgattaaa atataatggg agattcaatc 240 

agaaaaaagt ttctaaaaag gctctggggt aaaagaggaa ggaaacaata atgaaaaaaa 300 

tgtggtgaga aaaacagctg aaaacccatg taaagagtgt ataaagaaag caaaaagaga 360 

agtagaaagt aacacagggg catttggaaa atgtaaacga gtatgttccc tatttaaggc 420 

taggcacaaa gcaaggtctt cagagaacct ggagcctaag gtttaggctc acccatttca 480 

accagtctag cagcatctgc aacatctaca atggccttga cctttgcttt actggtggcc 540 

ctcctggtgc tcagctgcaa gtcaagctgc tctgtgggct gtgatctgcc tcaaacccac 600 

agcctgggta gcaggaggac cttgatgctc ctggcacaga tgaggagaat ctctcttttc 660 

tcctgcttga aggacagaca tgactttgga tttccccagg aggagtttgg caaccagttc 720 

caaaaggctg aaaccatccc tgtcctccat gagatgatcc agcagatctt caatctcttc 780 

agcacaaagg actcatctgc tgcttgggat gagaccctcc tagacaaatt ctacactgaa 840 

ctctaccagc agctgaatga cctggaagcc tgtgtgatac agggggtggg ggtgacagag 900 
actcccctga tgaaggagga ctccattctg gctgtgagga aatacttcca aagaatcact * 960 

ctctatctga aagagaagaa atacagccct tgtgcctggg aggttgtcag agcagaaatc 1020 

atgagatctt tttctttgtc aacaaacttg caagaaagtt taagaagtaa ggaatgaaaa 1080 

ctggttcaac atggaaatga ttttcattga ttcgtatgcc agctcacctt tttatgatct 1140 

gccatttcaa: agactcatgt ttctgctatg accatgacac gatttaaatc ttttcaaatg 1200 

tttttaggag tattaatcaa cattgtattc agctcttaag gcactagtcc cttacagagg 1260 

accatgctga ctgatccatt atctatttaa atatttttaa aatattattt atttaactat 1320 

ttataaaaca acttattttt gttcatatta tgtcatgtgc acctttgcac agtggttaat 1380 

gtaataaaat gtgttctttg tatttggtaa atttattttg tgttgttcat tgaacttttg 1440 

ctatggaact tttgtacttg tttattcttt aaaatgaaat tccaagccta attgtgcaac 1500 

ctgattacag aataactggt acacttcatt tgtccatcaa tattatattc aagatataag 1560 

taaaaataaa ctttctgtaa accaagttgt atgttgtact caagataaca gggtgaacct 1620 

aacaaataca attctgctct cttgtgtatt tgatttttgt atgaaaaaaa ctaaaaatgg 1680 

taatcatact taattatcag ttatggtaaa tggtatgaag agaagaagga acg " ~ 1733 

<210> 2 

<211> 188 

<212> PRT 

<213> Homo sapiens 

<400> 2 

Met Ala Leu Thr Phe Ala Leu Leu Val Ala Leu Leu Val Leu Ser Cys 

1 5 10 15 

Lys Ser Ser Cys Ser Val Gly Cys Asp Leu Pro Gin Thr His Ser Leu 

20 25 30 

Gly Ser Arg Arg Thr Leu Met Leu Leu Ala Gin Met Arg Arg lie Ser 
35 40 45 
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11/11 



TAAGGTTTAG GCTCACCCA7 T7CAACCAGT 
TTGACCTT7G CTTTACTGGT GGCCC7CC7G 

GGC 



Cap 

GAGA ACCTGGAGCC 
ATG (1) 

CTAGCAGCAT CTGCAACATC 7ACAA7GGCC 
GTGCTCAGCT GCAAGTCAAG CTGCTCTGTG 



FIG. 8 
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